Spelling Errors in Korean Students’ Constructed Responses and the Efficacy of Automatic Spelling Correction on Automated Computer Scoring

Author(s):  
Hyeonju Lee ◽  
Minsu Ha ◽  
Jurim Lee ◽  
Rahmi Qurota Aini ◽  
Ai Nurlaelasari Rusmana ◽  
...  
Author(s):  
Jane Moon ◽  
Frada Burstein

There has been a paradigm shift in medical practice. More and more consumers are using the Internet as a source for medical information even before seeing a doctor. The well known fact is that medical terms are often hard to spell. Despite advances in technology, the Internet is still producing futile searches when the search terms are misspelled. Often consumers are frustrated with irrelevant information they retrieve as a result of misspelling. An ontology-based search is one way of assisting users in correcting their spelling errors when searching for medical information. This chapter reviews the types of spelling errors that adults make and identifies current technology available to overcome the problem.


Information ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 202 ◽  
Author(s):  
Rui Dong ◽  
Yating Yang ◽  
Tonghai Jiang

This research was conducted to solve the out-of-vocabulary problem caused by Uyghur spelling errors in Uyghur–Chinese machine translation, so as to improve the quality of Uyghur–Chinese machine translation. This paper assesses three spelling correction methods based on machine translation: 1. Using a Bilingual Evaluation Understudy (BLEU) score; 2. Using a Chinese language model; 3. Using a bilingual language model. The best results were achieved in both the spelling correction task and the machine translation task by using the BLEU score for spelling correction. A maximum F1 score of 0.72 was reached for spelling correction, and the translation result increased the BLEU score by 1.97 points, relative to the baseline system. However, the method of using a BLEU score for spelling correction requires the support of a bilingual parallel corpus, which is a supervised method that can be used in corpus pre-processing. Unsupervised spelling correction can be performed by using either a Chinese language model or a bilingual language model. These two methods can be easily extended to other languages, such as Arabic.


2020 ◽  
Author(s):  
Tae Hyeong Kim ◽  
Min Ji Kang ◽  
Se Ha Lee ◽  
Jong-Ho Kim ◽  
Hyung Joon Joo ◽  
...  

BACKGROUND Existing bacterial culture test results for infectious diseases are written in unrefined text, resulting in many problems including typographical errors and stop words. Effective spelling correction processes are needed to ensure the accuracy and reliability of data for the study of infectious diseases, including medical terminology extraction. If a dictionary is established, spelling algorithms using edit distance are efficient. However, in the absence of dictionaries, traditional spelling correction algorithms that utilize only edit distances have limitations. OBJECTIVE In this research, we proposed a similarity-based spelling correction algorithm using pre-trained word embedding with the BioWordVec technique. This method uses a character-level N-grams-based distributed representation through unsupervised learning rather than the existing rule-based method. In other words, we propose a framework that detects and corrects typographical errors when a dictionary is not in place. METHODS For detected typographical errors not mapped to SNOMED clinical terms, a correction candidate group with high similarity considering the edit distance was generated using pre-trained word embedding from the clinical database. From the embedding matrix in which the vocabulary is arranged in descending order according to frequency, the grid search is used to search for candidate groups of similar words. Then, the correction candidate words are ranked in consideration of the frequency of the words, and the typos are finally corrected according to the ranking. RESULTS Bacteria identification words were extracted from 27,544 bacteria culture reports, and 16 types of 914 spelling errors were found. The similarity-based spelling correction algorithm using BioWordVec proposed in this research corrected 12 types of typographical errors and showed very high performance in correcting 99.45% of all spelling errors. CONCLUSIONS This tool corrected spelling errors effectively in the absence of a dictionary based on bacterial identification words in the bacteria culture reports. This method will help build a high-quality refined database of vast text data for electronic health records.


2011 ◽  
pp. 2244-2258
Author(s):  
Jane Moon

There has been a paradigm shift in medical practice. More and more consumers are using the Internet as a source for medical information even before seeing a doctor. The well known fact is that medical terms are often hard to spell. Despite advances in technology, the Internet is still producing futile searches when the search terms are misspelled. Often consumers are frustrated with irrelevant information they retrieve as a result of misspelling. An ontology-based search is one way of assisting users in correcting their spelling errors when searching for medical information. This chapter reviews the types of spelling errors that adults make and identifies current technology available to overcome the problem.


10.2196/25530 ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. e25530
Author(s):  
Taehyeong Kim ◽  
Sung Won Han ◽  
Minji Kang ◽  
Se Ha Lee ◽  
Jong-Ho Kim ◽  
...  

Background Existing bacterial culture test results for infectious diseases are written in unrefined text, resulting in many problems, including typographical errors and stop words. Effective spelling correction processes are needed to ensure the accuracy and reliability of data for the study of infectious diseases, including medical terminology extraction. If a dictionary is established, spelling algorithms using edit distance are efficient. However, in the absence of a dictionary, traditional spelling correction algorithms that utilize only edit distances have limitations. Objective In this research, we proposed a similarity-based spelling correction algorithm using pretrained word embedding with the BioWordVec technique. This method uses a character-level N-grams–based distributed representation through unsupervised learning rather than the existing rule-based method. In other words, we propose a framework that detects and corrects typographical errors when a dictionary is not in place. Methods For detected typographical errors not mapped to Systematized Nomenclature of Medicine (SNOMED) clinical terms, a correction candidate group with high similarity considering the edit distance was generated using pretrained word embedding from the clinical database. From the embedding matrix in which the vocabulary is arranged in descending order according to frequency, a grid search was used to search for candidate groups of similar words. Thereafter, the correction candidate words were ranked in consideration of the frequency of the words, and the typographical errors were finally corrected according to the ranking. Results Bacterial identification words were extracted from 27,544 bacterial culture and antimicrobial susceptibility reports, and 16 types of spelling errors and 914 misspelled words were found. The similarity-based spelling correction algorithm using BioWordVec proposed in this research corrected 12 types of typographical errors and showed very high performance in correcting 97.48% (based on F1 score) of all spelling errors. Conclusions This tool corrected spelling errors effectively in the absence of a dictionary based on bacterial identification words in bacterial culture and antimicrobial susceptibility reports. This method will help build a high-quality refined database of vast text data for electronic health records.


2015 ◽  
Vol 41 (1) ◽  
pp. 175-183 ◽  
Author(s):  
Priscila A. Gimenes ◽  
Norton T. Roman ◽  
Ariadne M. Carvalho

Fifty years after Damerau set up his statistics for the distribution of errors in typed texts, his findings are still used in a range of different languages. Because these statistics were derived from texts in English, the question of whether they actually apply to other languages has been raised. We address this issue through the analysis of a set of typed texts in Brazilian Portuguese, deriving statistics tailored to this language. Results show that diacritical marks play a major role, as indicated by the frequency of mistakes involving them, thereby rendering Damerau's original findings mostly unfit for spelling correction systems, although still holding them useful, should one set aside such marks. Furthermore, a comparison between these results and those published for Spanish show no statistically significant differences between both languages—an indication that the distribution of spelling errors depends on the adopted character set rather than the language itself.


2014 ◽  
Vol 21 (4) ◽  
pp. 333-354
Author(s):  
Kyu-cheol Yi ◽  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document