scholarly journals Accurate measurement of microsatellite length by disrupting its tandem repeat structure

2021 ◽  
Author(s):  
Dan Levy ◽  
Zihua Wang ◽  
Andrea Moffitt ◽  
Michael H. Wigler

Replication of tandem repeats of simple sequence motifs, also known as microsatellites, is error prone and variable lengths frequently occur during population expansions. Therefore, microsatellite length variations could serve as markers for cancer. However, accurate error-free quantitation of microsatellite lengths is difficult with current methods because of a high error rate during amplification and sequencing. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure so that it can replicate faithfully, yet not so much that the flanking regions cannot be reliably identified. In this work we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring two independent first copies of an initial template, we reach error rates below one in a million. We discuss potential clinical applications of this method.

Author(s):  
Shinichi Morishita ◽  
Kazuki Ichikawa ◽  
Gene Myers

Abstract Motivation Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10,000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10%-20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (< 1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder (TRF), a widely used program for finding tandem repeats, in terms of sensitivity. Software availability https://github.com/morisUtokyo/mTR


2018 ◽  
Author(s):  
Satomi Mitsuhashi ◽  
Martin C Frith ◽  
Takeshi Mizuguchi ◽  
Satoko Miyatake ◽  
Tomoko Toyota ◽  
...  

AbstractTandemly repeated sequences are highly mutable and variable features of genomes. Tandem repeat expansions are responsible for a growing list of human diseases, even though it is hard to determine tandem repeat sequences with current DNA sequencing technology. Recent long-read technologies are promising, because the DNA reads are often longer than the repetitive regions, but are hampered by high error rates. Here, we report robust detection of human repeat expansions from careful alignments of long (PacBio and nanopore) reads to a reference genome. Our method (tandem-genotypes) is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we can prioritize pathological expansions within the top 10 out of 700000 tandem repeats in the genome. This may help to elucidate the many genetic diseases whose causes remain unknown.


1995 ◽  
Vol 35 (4) ◽  
pp. 347-351 ◽  
Author(s):  
Douglas P W Kingsford

A review is presented of autopsy evidence demonstrating clinical diagnostic inaccuracy. Startling results emerge: the major clinical diagnosis is not confirmed in up to 45 per cent of cases, with typical error rates of up to 30 per cent; autopsy reveals unexpected major findings in up to 33 per cent of cases; management should have been different in up to 24 per cent of cases; clinicians cannot identify which patients are likely to have errant diagnoses; clinically ‘certain’ diagnoses still have a high error rate. These error rates have not changed significantly since an early study in 1912 despite the current widespread use of advanced investigation modalities.


Biomolecules ◽  
2018 ◽  
Vol 8 (4) ◽  
pp. 146 ◽  
Author(s):  
Hyun-Ju Hwang ◽  
Jin-Woo Han ◽  
Hancheol Jeon ◽  
Jong Han

Lectin is an important protein in medical and pharmacological applications. Impurities in lectin derived from natural sources and the generation of inactive proteins by recombinant technology are major obstacles for the use of lectins. Expressing recombinant lectin with a tandem repeat structure can potentially overcome these problems, but few studies have systematically examined this possibility. This was investigated in the present study using three distinct forms of recombinant mannose-binding lectin from Bryopsis plumosa (BPL2)—i.e., the monomer (rD1BPL2), as well as the dimer (rD2BPL2), and tetramer (rD4BPL2) arranged as tandem repeats. The concentration of the inducer molecule isopropyl β-D-1-thiogalactopyranoside and the induction time had no effect on the efficiency of the expression of each construct. Of the tested constructs, only rD4BPL2 showed hemagglutination activity towards horse erythrocytes; the activity of towards the former was 64 times higher than that of native BPL2. Recombinant and native BPL2 showed differences in carbohydrate specificity; the activity of rD4BPL2 was inhibited by the glycoprotein fetuin, whereas that of native BPL2 was also inhibited by d-mannose. Our results indicate that expression as tandem repeat sequences can increase the efficiency of lectin production on a large scale using a bacterial expression system.


Author(s):  
Mohammed Ahmed Ezzelregal Hassan ◽  
Mohamed Elsayed Hasan

A lot of research studies have been surveyed the completed genomes of prokaryotic and eukaryotic and focused on the correlation between the percentage of microsatellite sequences in completed genomes and the whole size of the organism genomes. There are fewer studies made in repetitive sequences otherwise simple sequence repeats or long tandem repeats of virus genomes. simple sequence repeats (SSRs) are the most important regions for recombination and moving repeats blocks from site to another site in the genomes. A tool was programmed and designed by visual basic 6.0 to find the long tandem repeats in DNA sequences of the small genomes. The tool named “Repeater Finder Regular Expression”, (RFRE) Version 1.0, 2016. The tool was utilized to discover different pattern of long tandem repeats (LTR) motifs on the completed genomes of human corona virus strains by using a joined regular expression language. In this study, a twenty-nine accession numbers of human coronavirus completed genomes, (hku1) strains were retrieved from the Genbank. The researcher can write a different regular expression patterns and joined regular expression patterns through the designed tool to search and find a specific motif of nucleotide sequences inside the complete genomes. The RFRE tool searched and found three different total lengths of a perfect long tandem repeats (240bp, 300bp and 480bp). A Dot plot gave a picture view for the long tandem repeat sequences in the completed genome sequence (KF430201.1) of human coronavirus. The genomic dot plot tool YASS was used as a genomic similarity searching tool to check for the uninterrupted repeats and confirm the sensitivity of the (RFRE) tool. To identify the recombination site in the genomes of human coronavirus the RAT tool was applied to find the recombination sites between the completed genomes of human corona viruses .The RAT tool recognized the recombination site in the nucleotide position (3012) and at the same time this recombination site position (3012) was also recognized as a beginning position of a long tandem repeat. A precise motif was predicted from the translated repeats of Human Corona Virus which found by PRATT tool. There was a relationship between the total length of long tandem repeats and genome size of Human Corona Virus and the correlation value R2 was equal to (0,451). In conclusion, this study presented the importance of finding the long tandem repeats of human coronavirus and gives a relationship between the completed genome size of human coronavirus types and long tandem repeats. The nucleotide position (3012) was a hot spot site for a recombination among the complete genomes of human coronavirus and also identified as a repetitive site in the genomes of human corona virus (hku1). The repeats in human coronavirus (hku1) were predicated to be a main major role of virus evolution.


2020 ◽  
Vol 48 (21) ◽  
pp. 11868-11879 ◽  
Author(s):  
Junwei Ji ◽  
Anil Day

Abstract A novel family of DNA polymerases replicates organelle genomes in a wide distribution of taxa encompassing plants and protozoans. Making error-prone mutator versions of gamma DNA polymerases revolutionised our understanding of animal mitochondrial genomes but similar advances have not been made for the organelle DNA polymerases present in plant mitochondria and chloroplasts. We tested the fidelities of error prone tobacco organelle DNA polymerases using a novel positive selection method involving replication of the phage lambda cI repressor gene. Unlike gamma DNA polymerases, ablation of 3′–5′ exonuclease function resulted in a modest 5–8-fold error rate increase. Combining exonuclease deficiency with a polymerisation domain substitution raised the organelle DNA polymerase error rate by 140-fold relative to the wild type enzyme. This high error rate compares favourably with error-rates of mutator versions of animal gamma DNA polymerases. The error prone organelle DNA polymerase introduced mutations at multiple locations ranging from two to seven sites in half of the mutant cI genes studied. Single base substitutions predominated including frequent A:A (template: dNMP) mispairings. High error rate and semi-dominance to the wild type enzyme in vitro make the error prone organelle DNA polymerase suitable for elevating mutation rates in chloroplasts and mitochondria.


2019 ◽  
Author(s):  
Michael Hahn ◽  
Frank Keller ◽  
Yonatan Bisk ◽  
Yonatan Belinkov

Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.


2019 ◽  
Vol 28 (4) ◽  
pp. 1411-1431 ◽  
Author(s):  
Lauren Bislick ◽  
William D. Hula

Purpose This retrospective analysis examined group differences in error rate across 4 contextual variables (clusters vs. singletons, syllable position, number of syllables, and articulatory phonetic features) in adults with apraxia of speech (AOS) and adults with aphasia only. Group differences in the distribution of error type across contextual variables were also examined. Method Ten individuals with acquired AOS and aphasia and 11 individuals with aphasia participated in this study. In the context of a 2-group experimental design, the influence of 4 contextual variables on error rate and error type distribution was examined via repetition of 29 multisyllabic words. Error rates were analyzed using Bayesian methods, whereas distribution of error type was examined via descriptive statistics. Results There were 4 findings of robust differences between the 2 groups. These differences were found for syllable position, number of syllables, manner of articulation, and voicing. Group differences were less robust for clusters versus singletons and place of articulation. Results of error type distribution show a high proportion of distortion and substitution errors in speakers with AOS and a high proportion of substitution and omission errors in speakers with aphasia. Conclusion Findings add to the continued effort to improve the understanding and assessment of AOS and aphasia. Several contextual variables more consistently influenced breakdown in participants with AOS compared to participants with aphasia and should be considered during the diagnostic process. Supplemental Material https://doi.org/10.23641/asha.9701690


2014 ◽  
Vol 53 (05) ◽  
pp. 343-343

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.


Sign in / Sign up

Export Citation Format

Share Document