scholarly journals EdiTyper: a high-throughput tool for analysis of targeted sequencing data from genome editing experiments

Author(s):  
Alexandre Yahi ◽  
Paul Hoffman ◽  
Margot Brandt ◽  
Pejman Mohammadi ◽  
Nicholas P. Tatonetti ◽  
...  

AbstractGenome editing experiments are generating an increasing amount of targeted sequencing data with specific mutational patterns indicating the success of the experiments and genotypes of clonal cell lines. We present EdiTyper, a high-throughput command line tool specifically designed for analysis of sequencing data from polyclonal and monoclonal cell populations from CRISPR gene editing. It requires simple inputs of sequencing data and reference sequences, and provides comprehensive outputs including summary statistics, plots, and SAM/BAM alignments. Analysis of simulated data showed that EdiTyper is highly accurate for detection of both single nucleotide mutations and indels, robust to sequencing errors, as well as fast and scalable to large experimental batches. EdiTyper is available in github (https://github.com/LappalainenLab/edityper) under the MIT license.

2020 ◽  
Vol 36 (9) ◽  
pp. 2725-2730
Author(s):  
Keisuke Shimmura ◽  
Yuki Kato ◽  
Yukio Kawahara

Abstract Motivation Genetic variant calling with high-throughput sequencing data has been recognized as a useful tool for better understanding of disease mechanism and detection of potential off-target sites in genome editing. Since most of the variant calling algorithms rely on initial mapping onto a reference genome and tend to predict many variant candidates, variant calling remains challenging in terms of predicting variants with low false positives. Results Here we present Bivartect, a simple yet versatile variant caller based on direct comparison of short sequence reads between normal and mutated samples. Bivartect can detect not only single nucleotide variants but also insertions/deletions, inversions and their complexes. Bivartect achieves high predictive performance with an elaborate memory-saving mechanism, which allows Bivartect to run on a computer with a single node for analyzing small omics data. Tests with simulated benchmark and real genome-editing data indicate that Bivartect was comparable to state-of-the-art variant callers in positive predictive value for detection of single nucleotide variants, even though it yielded a substantially small number of candidates. These results suggest that Bivartect, a reference-free approach, will contribute to the identification of germline mutations as well as off-target sites introduced during genome editing with high accuracy. Availability and implementation Bivartect is implemented in C++ and available along with in silico simulated data at https://github.com/ykat0/bivartect. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 56-56
Author(s):  
Michael Thomson

Abstract The precision and ease of use of CRISPR nucleases, such as Cas9 and Cpf1, for plant genome editing has the potential to accelerate a wide range of applications for crop improvement. For upstream research on gene discovery and validation, rapid gene knock-outs can enable testing of single genes and multi-gene families for functional effects. Large chromosomal deletions can test the function of tandem gene arrays and assist with positional cloning of QTLs by helping to narrow down the target region. Nuclease-deactivated Cas9 fusion proteins with transcriptional activators and repressors can be used to up and down-regulate gene expression. Even more promising, gene insertions and allele replacements can provide the opportunity to rapidly test the effects of different alleles at key loci in the same genetic background, providing a more precise alternative to marker-assisted backcrossing. Recently, Texas A&M AgriLife Research has supported the development of a Crop Genome Editing Lab at Texas A&M working towards optimizing a high-throughput gene editing pipeline and providing an efficient and cost-effective gene editing service for research and breeding groups. The lab is using rice as a model to test and optimize new approaches aimed towards overcoming current bottlenecks. For example, a wealth of genomics data from the rice community enables the development of novel approaches to predict which genes and target modifications may be most beneficial for crop improvement, taking advantage of known major genes, high-resolution GWAS data, multiple high-quality reference genomes, transcriptomics data, and resequencing data from the 3,000 Rice Genomes Project. Current projects have now expanded to work across multiple crops to provide breeding and research groups with a rapid gene editing pipeline to test candidate genes in their programs, with the ultimate goal of developing nutritious, high-yielding, stress-tolerant crops for the future.


2004 ◽  
Vol 50 (11) ◽  
pp. 2028-2036 ◽  
Author(s):  
Susan Bortolin ◽  
Margot Black ◽  
Hemanshu Modi ◽  
Ihor Boszko ◽  
Daniel Kobler ◽  
...  

Abstract Background: We have developed a novel, microsphere-based universal array platform referred to as the Tag-It™ platform. This platform is suitable for high-throughput clinical genotyping applications and was used for multiplex analysis of a panel of thrombophilia-associated single-nucleotide polymorphisms (SNPs). Methods: Genomic DNA from 132 patients was amplified by multiplex PCR using 6 primer sets, followed by multiplex allele-specific primer extension using 12 universally tagged genotyping primers. The products were then sorted on the Tag-It array and detected by use of the Luminex xMAP™ system. Genotypes were also determined by sequencing. Results: Empirical validation of the universal array showed that the highest nonspecific signal was 3.7% of the specific signal. Patient genotypes showed 100% concordance with direct DNA sequencing data for 736 SNP determinations. Conclusions: The Tag-It microsphere-based universal array platform is a highly accurate, multiplexed, high-throughput SNP-detection platform.


2017 ◽  
Author(s):  
Nima Nouri ◽  
Steven H. Kleinstein

AbstractMotivationDuring adaptive immune responses, activated B cells expand and undergo somatic hypermutation of their immunoglobulin (Ig) receptor, forming a clone of diversified cells that can be related back to a common ancestor. Identification of B cell clonotypes from high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data relies on computational analysis. Recently, we proposed an automate method to partition sequences into clonal groups based on single-linkage clustering of the Ig receptor junction region with length-normalized hamming distance metric. This method could identify clonally-related sequences with high confidence on several benchmark experimental and simulated data sets. However, this approach was computationally expensive, and unable to provide estimates of accuracy for new data. Here, a new method is presented that address this computational bottleneck and also provides a study-specific estimation of performance, including sensitivity and specificity. The method uses a finite mixture modeling fitting procedure for learning the parameters of two univariate curves which fit the bimodal distributions of the distance vector between pairs of sequences. These distribution are used to estimate the performance of different threshold choices for partitioning sequences into clonotypes. These performance estimates are validated using simulated and experimental datasets. With this method, clonotypes can be identified from AIRR-seq data with sensitivity and specificity profiles that are user-defined based on the overall goals of the study.AvailabilitySource code is freely available at the Immcantation Portal: www.immcantation.com under the CC BY-SA 4.0 [email protected]


2019 ◽  
Vol 4 ◽  
pp. 145
Author(s):  
Matthew N. Wakeling ◽  
Thomas W. Laver ◽  
Kevin Colclough ◽  
Andrew Parish ◽  
Sian Ellard ◽  
...  

Multiple Nucleotide Variants (MNVs) are miscalled by the most widely utilised next generation sequencing analysis (NGS) pipelines, presenting the potential for missing diagnoses that would previously have been made by standard Sanger (dideoxy) sequencing. These variants, which should be treated as a single insertion-deletion mutation event, are commonly called as separate single nucleotide variants. This can result in misannotation, incorrect amino acid predictions and potentially false positive and false negative diagnostic results. This risk will be increased as confirmatory Sanger sequencing of Single Nucleotide variants (SNVs) ceases to be standard practice. Using simulated data and re-analysis of sequencing data from a diagnostic targeted gene panel, we demonstrate that the widely adopted pipeline, GATK best practices, results in miscalling of MNVs and that alternative tools can call these variants correctly. The adoption of calling methods that annotate MNVs correctly would present a solution for individual laboratories, however GATK best practices are the basis for important public resources such as the gnomAD database. We suggest integrating a solution into these guidelines would be the optimal approach.


2015 ◽  
Author(s):  
Rahul Reddy

As RNA-Seq and other high-throughput sequencing grow in use and remain critical for gene expression studies, technical variability in counts data impedes studies of differential expression studies, data across samples and experiments, or reproducing results. Studies like Dillies et al. (2013) compare several between-lane normalization methods involving scaling factors, while Hansen et al. (2012) and Risso et al. (2014) propose methods that correct for sample-specific bias or use sets of control genes to isolate and remove technical variability. This paper evaluates four normalization methods in terms of reducing intra-group, technical variability and facilitating differential expression analysis or other research where the biological, inter-group variability is of interest. To this end, the four methods were evaluated in differential expression analysis between data from Pickrell et al. (2010) and Montgomery et al. (2010) and between simulated data modeled on these two datasets. Though the between-lane scaling factor methods perform worse on real data sets, they are much stronger for simulated data. We cannot reject the recommendation of Dillies et al. to use TMM and DESeq normalization, but further study of power to detect effects of different size under each normalization method is merited.


2015 ◽  
Vol 8 (2) ◽  
pp. 192-199 ◽  
Author(s):  
Maulik R. Upadhyay ◽  
Anand B. Patel ◽  
Ramalingam B. Subramanian ◽  
Tejas M. Shah ◽  
Subhash J. Jakhesara ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Marwan A. Hawari ◽  
Celine S. Hong ◽  
Leslie G. Biesecker

Abstract Background Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance. Results SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results. Conclusions SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim.


2018 ◽  
Author(s):  
Chang Xu ◽  
Xiujing Gu ◽  
Raghavendra Padmanabhan ◽  
Zhong Wu ◽  
Quan Peng ◽  
...  

AbstractMotivationLow-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end-repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling.ResultsWe developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit at 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data.AvailabilityThe entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license.


Sign in / Sign up

Export Citation Format

Share Document