Character-based Surprisal as a Model of Reading Difficulty in the Presence of Errors

Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.

Download Full-text

Accurate measurement of microsatellite length by disrupting its tandem repeat structure

10.1101/2021.12.09.471828 ◽

2021 ◽

Author(s):

Dan Levy ◽

Zihua Wang ◽

Andrea Moffitt ◽

Michael H. Wigler

Keyword(s):

Tandem Repeat ◽

Error Rate ◽

Tandem Repeats ◽

Clinical Applications ◽

Error Rates ◽

Sequence Motifs ◽

High Error Rate ◽

Repeat Structure ◽

Flanking Regions ◽

Simple Sequence

Replication of tandem repeats of simple sequence motifs, also known as microsatellites, is error prone and variable lengths frequently occur during population expansions. Therefore, microsatellite length variations could serve as markers for cancer. However, accurate error-free quantitation of microsatellite lengths is difficult with current methods because of a high error rate during amplification and sequencing. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure so that it can replicate faithfully, yet not so much that the flanking regions cannot be reliably identified. In this work we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring two independent first copies of an initial template, we reach error rates below one in a million. We discuss potential clinical applications of this method.

Download Full-text

A Review of Diagnostic Inaccuracy

Medicine Science and the Law ◽

10.1177/002580249503500413 ◽

1995 ◽

Vol 35 (4) ◽

pp. 347-351 ◽

Cited By ~ 6

Author(s):

Douglas P W Kingsford

Keyword(s):

Clinical Diagnosis ◽

Error Rate ◽

Error Rates ◽

High Error Rate ◽

Typical Error ◽

Clinical Diagnostic

A review is presented of autopsy evidence demonstrating clinical diagnostic inaccuracy. Startling results emerge: the major clinical diagnosis is not confirmed in up to 45 per cent of cases, with typical error rates of up to 30 per cent; autopsy reveals unexpected major findings in up to 33 per cent of cases; management should have been different in up to 24 per cent of cases; clinicians cannot identify which patients are likely to have errant diagnoses; clinically ‘certain’ diagnoses still have a high error rate. These error rates have not changed significantly since an early study in 1912 despite the current widespread use of advanced investigation modalities.

Download Full-text

Construction of a highly error-prone DNA polymerase for developing organelle mutation systems

Nucleic Acids Research ◽

10.1093/nar/gkaa929 ◽

2020 ◽

Vol 48 (21) ◽

pp. 11868-11879 ◽

Cited By ~ 1

Author(s):

Junwei Ji ◽

Anil Day

Keyword(s):

Dna Polymerase ◽

Error Rate ◽

Dna Polymerases ◽

Plant Mitochondria ◽

Error Rates ◽

Wild Type ◽

High Error Rate ◽

Wild Type Enzyme ◽

Organelle Dna

Abstract A novel family of DNA polymerases replicates organelle genomes in a wide distribution of taxa encompassing plants and protozoans. Making error-prone mutator versions of gamma DNA polymerases revolutionised our understanding of animal mitochondrial genomes but similar advances have not been made for the organelle DNA polymerases present in plant mitochondria and chloroplasts. We tested the fidelities of error prone tobacco organelle DNA polymerases using a novel positive selection method involving replication of the phage lambda cI repressor gene. Unlike gamma DNA polymerases, ablation of 3′–5′ exonuclease function resulted in a modest 5–8-fold error rate increase. Combining exonuclease deficiency with a polymerisation domain substitution raised the organelle DNA polymerase error rate by 140-fold relative to the wild type enzyme. This high error rate compares favourably with error-rates of mutator versions of animal gamma DNA polymerases. The error prone organelle DNA polymerase introduced mutations at multiple locations ranging from two to seven sites in half of the mutant cI genes studied. Single base substitutions predominated including frequent A:A (template: dNMP) mispairings. High error rate and semi-dominance to the wild type enzyme in vitro make the error prone organelle DNA polymerase suitable for elevating mutation rates in chloroplasts and mitochondria.

Download Full-text

An End-to-end Oxford Nanopore Basecaller Using Convolution-augmented Transformer

10.1101/2020.11.09.374165 ◽

2020 ◽

Author(s):

Xuan Lv ◽

Zhiguang Chen ◽

Yutong Lu ◽

Yuedong Yang

Keyword(s):

Computational Model ◽

Open Source ◽

Error Rate ◽

Nucleotide Sequences ◽

Nanopore Sequencing ◽

High Error Rate ◽

Slow Speed ◽

Global Context ◽

Oxford Nanopore ◽

Read Accuracy

AbstractOxford Nanopore sequencing is fastly becoming an active field in genomics, and it’s critical to basecall nucleotide sequences from the complex electrical signals. Many efforts have been devoted to developing new basecalling tools over the years. However, the basecalled reads still suffer from a high error rate and slow speed. Here, we developed an open-source basecalling method, CATCaller, by simultaneously capturing global context through Attention and modeling local dependencies through dynamic convolution. The method was shown to consistently outper-form the ONT default basecaller Albacore, Guppy, and a recently developed attention-based method SACall in read accuracy. More importantly, our method is fast through a heterogeneously computational model to integrate both CPUs and GPUs. When compared to SACall, the method is nearly 4 times faster on a single GPU, and is highly scalable in parallelization with a further speedup of 3.3 on a four-GPU node.

Download Full-text

Perceptual Characteristics of Consonant Production in Apraxia of Speech and Aphasia

American Journal of Speech-Language Pathology ◽

10.1044/2019_ajslp-18-0169 ◽

2019 ◽

Vol 28 (4) ◽

pp. 1411-1431 ◽

Cited By ~ 3

Author(s):

Lauren Bislick ◽

William D. Hula

Keyword(s):

Error Rate ◽

Error Rates ◽

Group Differences ◽

Diagnostic Process ◽

Apraxia Of Speech ◽

Contextual Variables ◽

Error Type ◽

Type Distribution ◽

Phonetic Features ◽

Syllable Position

Purpose This retrospective analysis examined group differences in error rate across 4 contextual variables (clusters vs. singletons, syllable position, number of syllables, and articulatory phonetic features) in adults with apraxia of speech (AOS) and adults with aphasia only. Group differences in the distribution of error type across contextual variables were also examined. Method Ten individuals with acquired AOS and aphasia and 11 individuals with aphasia participated in this study. In the context of a 2-group experimental design, the influence of 4 contextual variables on error rate and error type distribution was examined via repetition of 29 multisyllabic words. Error rates were analyzed using Bayesian methods, whereas distribution of error type was examined via descriptive statistics. Results There were 4 findings of robust differences between the 2 groups. These differences were found for syllable position, number of syllables, manner of articulation, and voicing. Group differences were less robust for clusters versus singletons and place of articulation. Results of error type distribution show a high proportion of distortion and substitution errors in speakers with AOS and a high proportion of substitution and omission errors in speakers with aphasia. Conclusion Findings add to the continued effort to improve the understanding and assessment of AOS and aphasia. Several contextual variables more consistently influenced breakdown in participants with AOS compared to participants with aphasia and should be considered during the diagnostic process. Supplemental Material https://doi.org/10.23641/asha.9701690

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

The Error in ‘Error Rate’: Why Error Rates Are So Needed, Yet So Elusive

SSRN Electronic Journal ◽

10.2139/ssrn.3593309 ◽

2020 ◽

Author(s):

Itiel Dror

Keyword(s):

Error Rate ◽

Error Rates

Download Full-text

Reducing Culture Reporting Errors in the Microbiology Laboratory

American Journal of Clinical Pathology ◽

10.1093/ajcp/aqz125.009 ◽

2019 ◽

Vol 152 (Supplement_1) ◽

pp. S131-S132

Author(s):

Kathryn Hogan ◽

Beena Umar ◽

Mohamed Alhamar ◽

Kathleen Callahan ◽

Linoj Samuel

Keyword(s):

Real Time ◽

Error Rate ◽

Reporting System ◽

Error Rates ◽

Tracking Errors ◽

Corrective Measures ◽

Safety Awareness ◽

Before And After ◽

Set Up ◽

System Errors

Abstract Objectives There are few papers that characterize types of errors in microbiology laboratories and scant research demonstrating the effects of interventions on microbiology lab errors. This study aims to categorize types of culture reporting errors found in microbiology labs and to document the error rates before and after interventions designed to reduce errors and improve overall laboratory quality. Methods To improve documentation of error incidence, a self-reporting system was changed to an automatic reporting system. Errors were categorized into five types Gram stain (misinterpretations), identification (incorrect analysis), set up labeling (incorrect patient labels), procedures (not followed), and miscellaneous. Error rates were tracked according to technologist, and technologists were given real-time feedback by a manager. Error rates were also monitored in the daily quality meeting and frequently detected errors were discussed at staff meetings. Technologists attended a year-end review with a manager to improve their performance. To maintain these changes, policies were developed to monitor technologist error rate and to define corrective measures. If a certain number of errors per month was reached, technologists were required to undergo retraining by a manager. If a technologist failed to correct any error according to protocol, they were also potentially subject to corrective measures. Results In 2013, we recorded 0.5 errors per 1,000 tests. By 2018, we recorded only 0.1 errors per 1,000 tests, an 80% decrease. The yearly culture volume from 2013 to 2018 increased by 32%, while the yearly error rate went from 0.05% per year to 0.01% per year, a statistically significant decrease (P = .0007). Conclusion This study supports the effectiveness of the changes implemented to decrease errors in culture reporting. By tracking errors in real time and using a standardized process that involved timely follow-up, technologists were educated on error prevention. This practice increased safety awareness in our micro lab.

Download Full-text

Evaluation of eavesdropping error-rates in higher-dimensional QKD system implemented using dynamic spatial modes

International Journal of Quantum Information ◽

10.1142/s0219749921500301 ◽

2021 ◽

Author(s):

Muhammad Kamran ◽

Tahir Malik ◽

Muhammad Mubashir Khan

Keyword(s):

Error Rate ◽

Photon Number ◽

Error Rates ◽

Security And Privacy ◽

Practical Implementation ◽

Secret Key ◽

Decoy State ◽

Rate Analysis ◽

Higher Dimensional ◽

Spatial Modes

Secure exchange of cryptographic keys is extremely important for any communication system where security and privacy of data is desirable. Although classical cryptographic algorithms provide computationally secure methods for secret key exchange, quantum key distribution (QKD) provides an extraordinary means to this end by guaranteeing unconditional security. Any malicious interception of communication by a man-in-the-middle on a QKD link immediately alerts sender and receiver by introducing an unavoidable error-rate. Higher-dimensional QKD protocols such as KMB09 exhibit higher eavesdropping error-rates with improved intrusion detection but their practical implementation is still awaited. In this paper, we present the design and implementation of KMB09 protocol using Laguerre–Gaussian orbital angular momentum to demonstrate and highlight the advantages of using dynamic spatial modes in QKD system. A complete error-rate analysis of KMB09 protocol implementation is presented with two different types of eavesdropping error-rates. Furthermore, we also demonstrate the decoy state method to show the robustness of the protocol against photon-number-splitting attack.

Download Full-text

Application of the False Discovery Rate to Quantitative Trait Loci Interval Mapping With Multiple Traits

Genetics ◽

10.1093/genetics/161.2.905 ◽

2002 ◽

Vol 161 (2) ◽

pp. 905-914 ◽

Cited By ~ 1

Author(s):

Hakkyo Lee ◽

Jack C M Dekkers ◽

M Soller ◽

Massoud Malek ◽

Rohan L Fernando ◽

...

Keyword(s):

Quantitative Trait Loci ◽

False Discovery Rate ◽

Error Rate ◽

Quantitative Trait ◽

Interval Mapping ◽

Error Rates ◽

Multiple Traits ◽

Test Statistic ◽

False Discovery ◽

Trait Loci

Abstract Controlling the false discovery rate (FDR) has been proposed as an alternative to controlling the genomewise error rate (GWER) for detecting quantitative trait loci (QTL) in genome scans. The objective here was to implement FDR in the context of regression interval mapping for multiple traits. Data on five traits from an F2 swine breed cross were used. FDR was implemented using tests at every 1 cM (FDR1) and using tests with the highest test statistic for each marker interval (FDRm). For the latter, a method was developed to predict comparison-wise error rates. At low error rates, FDR1 behaved erratically; FDRm was more stable but gave similar significance thresholds and number of QTL detected. At the same error rate, methods to control FDR gave less stringent significance thresholds and more QTL detected than methods to control GWER. Although testing across traits had limited impact on FDR, single-trait testing was recommended because there is no theoretical reason to pool tests across traits for FDR. FDR based on FDRm was recommended for QTL detection in interval mapping because it provides significance tests that are meaningful, yet not overly stringent, such that a more complete picture of QTL is revealed.

Download Full-text