scholarly journals Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities

SinkrOn ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 183-190
Author(s):  
Emmy Erwina ◽  
Tommy Tommy ◽  
Mayasari Mayasari

Spelling error has become an error that is often found in this era which can be seen from the use of words that tend to follow trends or culture, especially in the younger generation. This study aims to develop and test a detection and identification model using a combination of Bigram Vector and Minimum Edit Distance Based Probabilities. Correct words from error words are obtained using candidates search and probability calculations that adopt the concept of minimum edit distance. The detection results then identified the error type into three types of errors, namely vowels, consonants and diphthongs from the error side on the tendency of the characters used as a result of phonemic rendering at the time of writing. The results of error detection and identification of error types obtained are quite good where most of the error test data can be detected and identified according to the type of error, although there are several detection errors by obtaining more than one correct word as a result of the same probability value of these words.

2019 ◽  
Vol 6 (2) ◽  
pp. 111-120
Author(s):  
Jamal Ali Omar

Abstract The current study investigates the types and sources of spelling errors of Kurdish EFL learners.  To this end, 82 argumentative articles written by university-level students were analyzed to identify the spelling errors. The process of classification and identification of the error type in the research was based on Cook’s (1997) familiar categories of errors. The results obtained showed that the type of errors were omission, insertion, substitution, transposition, space accuracy and capitalization. Those errors were originated from various sources among which instructional, overgeneralization and pure were prominent. It was also concluded that letter/sound correspondence creates problem for the learners. Interpretation of the results implies that pedagogical decision and further research is required in the learning context of Kurdish EFL learners.    


2019 ◽  
Vol 26 (3) ◽  
pp. 211-218 ◽  
Author(s):  
Chris J Lu ◽  
Alan R Aronson ◽  
Sonya E Shooshan ◽  
Dina Demner-Fushman

Abstract Objective Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above. Methods We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions. Results Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively. Discussion The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system. Conclusion CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.


2017 ◽  
Vol 18 (1) ◽  
pp. 64-77 ◽  
Author(s):  
Alison B. Flynn ◽  
Ryan B. Featherstone

This study investigated students' successes, strategies, and common errors in their answers to questions that involved the electron-pushing (curved arrow) formalism (EPF), part of organic chemistry's language. We analyzed students' answers to two question types on midterms and final exams: (1) draw the electron-pushing arrows of a reaction step, given the starting materials and products; and (2) draw the products of a reaction step, given the starting materials and electron-pushing arrows. For both question types, students were given unfamiliar reactions. The goal was for students to gain proficiency—or fluency—using and interpreting the EPF. By first becoming fluent, students should have lower cognitive load demands when learning subsequent concepts and reactions, positioning them to learn more deeply. Students did not typically draw reversed or illogical arrows, but there were many other error types. Scores on arrows questions were significantly higher than on products questions. Four factors correlated with lower question scores, including: compounds bearing implicit atoms, intramolecular reactions, assessment year, and the conformation of reactants drawn on the page. We found little evidence of analysis strategies such as expanding or mapping structures. We also found a new error type that we describe as picking up electrons and setting them down on a different atom. These errors revealed the difficulties that arose even before the students had to consider the chemical meaning and implications of the reactions. Herein, we describe our complete findings and suggestions for instruction, including videos that we created to teach the EPF.


2018 ◽  
Vol 12 (1) ◽  
pp. 67-82 ◽  
Author(s):  
Silvana Maria R. Watson ◽  
João Lopes ◽  
Célia Oliveira ◽  
Sharon Judge

PurposeThe purpose of this descriptive study is to investigate why some elementary children have difficulties mastering addition and subtraction calculation tasks.Design/methodology/approachThe researchers have examined error types in addition and subtraction calculation made by 697 Portuguese students in elementary grades. Each student completed a written assessment of mathematical knowledge. A system code (e.g. FR = failure to regroup) has been used to grade the tests. A reliability check has been performed on 65 per cent randomly selected exams.FindingsData frequency analyses reveal that the most common type of error was miscalculation for both addition (n= 164; 38.6 per cent) and subtraction (n= 180; 21.7 per cent). The second most common error type was related to failure to regroup in addition (n= 74; 17.5 per cent) and subtraction (n= 139; 16.3 per cent). Frequency of error types by grade level has been provided. Findings from the hierarchical regression analyses indicate that students’ performance differences emerged as a function of error types which indicated students’ types of difficulties.Research limitations/implicationsThere are several limitations of this study: the use of a convenient sample; all schools were located in the northern region of Portugal; the limited number of problems; and the time of the year of assessment.Practical implicationsStudents’ errors suggested that their performance in calculation tasks is related to conceptual and procedural knowledge and skills. Error analysis allows teachers to better understand the individual performance of a diverse group and to tailor instruction to ensure that all students have an opportunity to succeed in mathematics.Social implicationsError analysis helps teachers uncover individual students’ difficulties and deliver meaningful instruction to all students.Originality/valueThis paper adds to the international literature on error analysis and reinforces its value in diagnosing students’ type and severity of math difficulties.


Sign in / Sign up

Export Citation Format

Share Document