large indels
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 8)

H-INDEX

4
(FIVE YEARS 3)

Author(s):  
Stephen E. Lincoln ◽  
Tina Hambuch ◽  
Justin M. Zook ◽  
Sara L. Bristow ◽  
Kathryn Hatchell ◽  
...  

Abstract Purpose To evaluate the impact of technically challenging variants on the implementation, validation, and diagnostic yield of commonly used clinical genetic tests. Such variants include large indels, small copy-number variants (CNVs), complex alterations, and variants in low-complexity or segmentally duplicated regions. Methods An interlaboratory pilot study used synthetic specimens to assess detection of challenging variant types by various next-generation sequencing (NGS)–based workflows. One well-performing workflow was further validated and used in clinician-ordered testing of more than 450,000 patients. Results In the interlaboratory study, only 2 of 13 challenging variants were detected by all 10 workflows, and just 3 workflows detected all 13. Limitations were also observed among 11 less-challenging indels. In clinical testing, 21.6% of patients carried one or more pathogenic variants, of which 13.8% (17,561) were classified as technically challenging. These variants were of diverse types, affecting 556 of 1,217 genes across hereditary cancer, cardiovascular, neurological, pediatric, reproductive carrier screening, and other indicated tests. Conclusion The analytic and clinical sensitivity of NGS workflows can vary considerably, particularly for prevalent, technically challenging variants. This can have important implications for the design and validation of tests (by laboratories) and the selection of tests (by clinicians) for a wide range of clinical indications.


2021 ◽  
Author(s):  
F. Lencina ◽  
A.M. Landau ◽  
M.G. Pacheco ◽  
K. Kobayashi ◽  
A.R. Prina

AbstractIn a previous work, a polymorphism detection strategy based on mismatch digestion was applied to the chloroplast genome of barley seedlings that carried the chloroplast mutator (cpm) genotype through many generations. Sixty-two different one- or two-nucleotide-polymorphisms were detected along with four large indels: an insertion of 15 bp in the intergenic region between tRNAHis and rps19 genes, a deletion of 620 bp in the psbA gene, a deletion of 79 bp in the intergenic region between rpl33 and rps18 genes and a deletion of 45 bp in the rps3 gene. In the present investigation, we analyzed direct repeats located at the borders of those four large indels. Furthermore, we investigated the consequences of protein expression of large indels located in coding regions. The deletion of 620 bp in the psbA gene was lethal at the second leaf stage when homoplastomic. The deletion of 45 bp in the rps3 gene, which eliminates 15 amino acids, did not affect the viability of the seedlings in homoplastomy. Interestingly, the deleted segment is also lacking in the wild type version of the rps3 gene of maize and sorghum. The presence of direct repeats at the borders of the four large indels suggests that they could have originated by illegitimate recombination. This would be in agreement with a previous hypothesis that the Cpm gene product would correspond to a mismatch repair (MMR) protein devoted to maintain plastome stability by playing fundamental roles in mismatch repair during replication and avoiding illegitimate recombination.


2021 ◽  
Author(s):  
Yunfeng Wang ◽  
Haoliang Xue ◽  
Christine Pourcel ◽  
Yang Du ◽  
Daniel Gautheret

AbstractThe detection of genome variants, including point mutations, indels and structural variants, is a fundamental and challenging computational problem. We address here the problem of variant detection between two deep-sequencing (DNA-seq) samples, such as two human samples from an individual patient, or two samples from distinct bacterial strains. The preferred strategy in such a case is to align each sample to a common reference genome, collect all variants and compare these variants between samples. Such mapping-based protocols have several limitations. DNA sequences with large indels, aggregated mutations and structural variants are hard to map to the reference. Furthermore, DNA sequences cannot be mapped reliably to genomic low complexity regions and repeats. Herein, we introduce 2-kupl, a k-mer based, mapping-free protocol to detect variants between two DNA-seq samples. On simulated and actual data, 2-kupl achieves a higher precision than other mapping-free protocols. Applying 2-kupl to prostate cancer whole exome data, we identify a number of candidate variants in hard-to-map regions and propose potential novel recurrent variants in this disease.


2020 ◽  
Author(s):  
Stephen E Lincoln ◽  
Tina Hambuch ◽  
Justin M Zook ◽  
Sara L Bristow ◽  
Kathryn Hatchell ◽  
...  

Purpose: To evaluate the impact of technically challenging variants on the implementation, validation, and diagnostic yield of commonly used clinical genetic tests. Such variants include large indels, small CNVs, complex alterations, and variants in low-complexity or segmentally duplicated regions. Methods: An interlaboratory pilot study used novel synthetic specimens to assess detection of challenging variant types by various NGS-based workflows. One well-performing workflow was further validated and used in clinician-ordered testing of more than 450,000 patients. Results: In the interlaboratory study, only two of 13 challenging variants were detected by all 10 workflows, and just three workflows detected all 13. Limitations were also observed among 11 less-challenging indels. In clinical testing, 21.6% of patients carried one or more pathogenic variants, of which 13.8% (17,561) were classified as technically challenging. These variants were of diverse types, affecting 556 of 1,217 genes across hereditary cancer, cardiovascular, neurological, pediatric, reproductive carrier screening, and other indicated tests. Conclusion: The analytic and clinical sensitivity of NGS workflows can vary considerably, particularly for prevalent, technically challenging variants. This can have important implications for the design and validation of tests (by laboratories) and the selection of tests (by clinicians) for a wide range of clinical indications.


Author(s):  
Justin Wagner ◽  
Nathan D Olson ◽  
Lindsay Harris ◽  
Ziad Khan ◽  
Jesse Farek ◽  
...  

AbstractGenome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling and sequencing methods. Here we use accurate long and linked reads to expand the prior benchmark to include difficult-to-map regions and segmental duplications that are not readily accessible to short reads. Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16 % new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., PMS2). We increase coverage of the autosomal GRCh38 assembly from 85 % to 92 %, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and assembly errors) that should not have been in the previous version. Our new benchmark reliably identifies both false positives and false negatives across multiple short-, linked-, and long-read based variant calling methods. As an example of its utility, this benchmark identifies eight times more false negatives in a short read variant call set relative to our previous benchmark, mostly in difficult-to-map regions. To enable robust small variant benchmarking, we still exclude 3.6% of GRCh37 and 5.0% of GRCh38 in (1) highly repetitive regions such as large, highly similar segmental duplications and the centromere not accessible to our data and (2) regions where our sample is highly divergent from the reference due to large indels, structural variation, copy number variation, and/or errors in the reference (e.g., some KIR genes that have duplications in HG002). We have demonstrated the utility of this benchmark to assess performance in more challenging regions, which enables benchmarking in more difficult genes and continued technology and bioinformatics development. The v4.2.1 benchmarks are available under ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Manish Goel ◽  
Hequan Sun ◽  
Wen-Biao Jiao ◽  
Korbinian Schneeberger

AbstractGenomic differences range from single nucleotide differences to complex structural variations. Current methods typically annotate sequence differences ranging from SNPs to large indels accurately but do not unravel the full complexity of structural rearrangements, including inversions, translocations, and duplications, where highly similar sequence changes in location, orientation, or copy number. Here, we present SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions.


2019 ◽  
Vol 3 (3) ◽  
pp. 327-334 ◽  
Author(s):  
Soragia Athina Gkazi

Abstract Recent advances in the era of genetic engineering have significantly improved our ability to make precise changes in the genomes of human cells. Throughout the years, clinical trials based on gene therapies have led to the cure of diseases such as X-linked severe combined immunodeficiency (SCID-X1), adenosine deaminase deficiency (ADA-SCID) and Wiskott–Aldrich syndrome. Despite the success gene therapy has had, there is still the risk of genotoxicity due to the potential oncogenesis introduced by utilising viral vectors. Research has focused on alternative strategies like genome editing without viral vectors as a means to reduce genotoxicity introduced by the viral vectors. Although there is an extensive use of RNA-guided genome editing via the clustered regularly interspaced short palindromic repeats (CRISPR) and associated protein-9 (Cas9) technology for biomedical research, its genome-wide target specificity and its genotoxic side effects remain controversial. There have been reports of on- and off-target effects created by CRISPR–Cas9 that can include small and large indels and inversions, highlighting the potential risk of insertional mutagenesis. In the last few years, a plethora of in silico, in vitro and in vivo genome-wide assays have been introduced with the sole purpose of profiling these effects. Here, we are going to discuss the genotoxic obstacles in gene therapies and give an up-to-date overview of methodologies for quantifying CRISPR–Cas9 effects.


2019 ◽  
Author(s):  
Lesley M Chapman ◽  
Noah Spies ◽  
Patrick Pai ◽  
Chun Shen Lim ◽  
Andrew Carroll ◽  
...  

AbstractA high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is yet to be defined. In this study, we manually curated 1235 SVs which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app – SVCurator – to help curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy.SVCurator is a Python Flask-based web platform that displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002], We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. The crowdsourced results were highly concordant with 37 out of the 61 curators having at least 78% concordance with a set of ‘expert’ curators, where there was 93% concordance amongst ‘expert’ curators. This produced high confidence labels for 935 events. When compared to the heuristic-based draft benchmark SV callset from GIAB, the SVCurator crowdsourced labels were 94.5% concordant with the benchmark set. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


2018 ◽  
Author(s):  
Xian Fan ◽  
Jie Xu ◽  
Luay Nakhleh

AbstractOptical Maps (OM) provide reads that are very long, and thus can be used to detect large indels not detectable by the shorter reads provided by sequence-based technologies such as Illumina and PacBio. Two existing tools for detecting large indels from OM data are BioNano Solve and OMSV. However, these two tools may miss indels with weak signals. We propose a local-assembly based approach, OMIndel, to detect large indels with OM data. The results of applying OMIndel to empirical data demonstrate that it is able to detect indels with weak signal. Furthermore, compared with the other two OM-based methods, OMIndel has a lower false discovery rate. We also investigated the indels that can only be detected by OM but not Illumina, PacBio or 10X, and we found that they mostly fall into two categories: complex events or indels on repetitive regions. This implies that adding the OM data to sequence-based technologies can provide significant progress towards a more complete characterization of structural variants (SVs). The algorithm has been implemented in Perl and is publicly available onhttps://bitbucket.org/xianfan/optmethod.


2018 ◽  
Author(s):  
Wangjun Wu ◽  
Zengkai Zhang ◽  
Zhe Chao ◽  
Bojiang Li ◽  
Caibo Ning ◽  
...  

ABSTRACTIn livestock, glycolytic potential (GP) is a critical indicator for evaluating the meat quality. To date, two major genes protein kinase AMP-activated γ3 non-catalytic subunit gene (PRKAG3) and phosphorylase kinase catalytic subunit gamma 1(PHKG1), and corresponding cause mutations influencing GP have been confirmed in pigs. Therefore, the aim of this study to identify the novel candidate genes and variations related to GP-related traits using a four-hybrid pig model [Pietrain (P)× Duroc (D)] ×[(Landrace) ×(Yorkshire)]. We totally constructed six RNA-seq libraries using longissimus dorsi (LD) muscles, and each library contained two higher GP (H) or two lower GP (L) individuals. A total of 525, 698 and 135 differentially expressed genes (DEGs) were identified between H11 vs L11, H9 vs L9, and H5 vs L5 groups using PossionDis method, respectively. Notably, we found 97 non-redundant DEGs were mapped to GP related QTLs from three paired comparison groups. Moreover, 69 DEGs were identified between H (H11, H9 and H5) and L (L11, L9 and L5) groups using NOIseq method. Additionally, 1,076 potential specific SNPs were figured out between H and L groups, and approximately 40 large Indels with a length ≥ 5bp were identified in each sequencing library. In conclusion, our data provide foundation for further confirming the key genes and the functional mutations affecting GP-related traits in pigs, and also pave the way for elucidating the underling molecular regulatory mechanisms of glycogen metabolism in future study. Moreover, this study might provide valuable information for study on human glycogen storage diseases.


Sign in / Sign up

Export Citation Format

Share Document