A hybrid model for spelling error detection and correction for Urdu language

This paper proposed a spelling error detection and correction using the linguistic features and knowledge resource. The linguistic features mainly come from language model that describes the probability of a sentence. In practice, the formal document with typos is defective and fall short of the specifications, since typos and error hidden in printed document are frequent, rework will cause the waste of paper and ink. This paper proposed an approach that addresses the spelling errors and before printing. In this method, the linguistic features are used in this research to compare and increase a new feature additionally that is a function of Internet search based on knowledge bases. Combining these research manners, this paper expect to achieve the goals of confirming, improving the detection rate of typos, and reducing the waste of resources. Experimental results shows, the proposed method is practicable and efficient for users to detect the typos in the printed documents.

Download Full-text

Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2016.10003960 ◽

2016 ◽

Vol 8 (3/4) ◽

pp. 91 ◽

Cited By ~ 1

Author(s):

Ahmed M. Al Trad ◽

Iyad Abu Doush

Keyword(s):

Error Detection ◽

Character Recognition ◽

Optical Character Recognition ◽

Arabic Language ◽

Spelling Error ◽

Post Processing ◽

Optical Character ◽

Error Detection And Correction

Download Full-text

Spell checker for consumer language (CSpell)

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy171 ◽

2019 ◽

Vol 26 (3) ◽

pp. 211-218 ◽

Cited By ~ 7

Author(s):

Chris J Lu ◽

Alan R Aronson ◽

Sonya E Shooshan ◽

Dina Demner-Fushman

Keyword(s):

Error Detection ◽

State Of The Art ◽

Word Boundary ◽

Spelling Error ◽

Consumer Health ◽

Correct Word ◽

Ranking System ◽

Novel Approach ◽

Error Detection And Correction ◽

Spell Checker

Abstract Objective Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above. Methods We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions. Results Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively. Discussion The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system. Conclusion CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.

Download Full-text

Granska API – an Online API for Grammar Checking and Other NLP Services

10.3384/ecp184175 ◽

2021 ◽

Author(s):

Jonas Sjöbergh ◽

Viggo Kann

Keyword(s):

Language Processing ◽

Error Detection ◽

Spelling Error ◽

Google Docs ◽

Part Of Speech ◽

Microsoft Word ◽

Error Detection And Correction ◽

Shallow Parsing ◽

Set Up ◽

Swedish Text

We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.

Download Full-text

Arabic spelling error detection and correction

Natural Language Engineering ◽

10.1017/s1351324915000030 ◽

2015 ◽

Vol 22 (5) ◽

pp. 751-773 ◽

Cited By ~ 4

Author(s):

MOHAMMED ATTIA ◽

PAVEL PECINA ◽

YOUNES SAMIH ◽

KHALED SHAALAN ◽

JOSEF VAN GENABITH

Keyword(s):

Error Detection ◽

Language Model ◽

Word List ◽

Error Model ◽

Spelling Error ◽

Google Docs ◽

Microsoft Word ◽

Optimal Subset ◽

Error Detection And Correction ◽

Main Components

AbstractA spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.

Download Full-text

A Survey on Context Sensitive Spelling Error Detection and Correction

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (ICETET-2015) ◽

10.3850/978-981-09-5346-1_cse-034 ◽

2015 ◽

Author(s):

N. K. Princy ◽

P. Ezudheen

Keyword(s):

Error Detection ◽

Spelling Error ◽

Context Sensitive ◽

Error Detection And Correction

Download Full-text