Granska API – an Online API for Grammar Checking and Other NLP Services

Shallow Parsing ◽

Set Up ◽

Swedish Text

We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.

Arabic spelling error detection and correction

Natural Language Engineering ◽

10.1017/s1351324915000030 ◽

2015 ◽

Vol 22 (5) ◽

pp. 751-773 ◽

Cited By ~ 4

Author(s):

MOHAMMED ATTIA ◽

PAVEL PECINA ◽

YOUNES SAMIH ◽

KHALED SHAALAN ◽

JOSEF VAN GENABITH

Keyword(s):

Error Detection ◽

Language Model ◽

Word List ◽

Error Model ◽

Spelling Error ◽

Google Docs ◽

Microsoft Word ◽

Optimal Subset ◽

Main Components

AbstractA spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.

Data-driven spell checking: The synergy of two algorithms for spelling error detection and correction

International Conference on Advances in ICT for Emerging Regions (ICTer2012) ◽

10.1109/icter.2012.6422063 ◽

2012 ◽

Cited By ~ 3

Author(s):

Eranga Jayalatharachchi ◽

Asanka Wasala ◽

Ruvan Weerasinghe

Keyword(s):

Error Detection ◽

Data Driven ◽

Spelling Error ◽

SPELLING ERROR DETECTION AND CORRECTION BY COMPUTER: SOME NOTES AND A BIBLIOGRAPHY

Journal of Documentation ◽

10.1108/eb026733 ◽

1982 ◽

Vol 38 (4) ◽

pp. 282-291 ◽

Cited By ~ 16

Author(s):

J.J. POLLOCK

Keyword(s):

Error Detection ◽

Spelling Error ◽

Arib$@$QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling Error Detection and Correction

10.18653/v1/w15-3214 ◽

2015 ◽

Cited By ~ 1

Author(s):

Nouf AlShenaifi ◽

Rehab AlNefie ◽

Maha Al-Yahya ◽

Hend Al-Khalifa

Keyword(s):

Error Detection ◽

Cascade Model ◽

Spelling Error ◽

Shared Task ◽

Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2016.082957 ◽

2016 ◽

Vol 8 (3/4) ◽

pp. 91

Author(s):

Iyad Abu Doush ◽

Ahmed M. Al Trad

Keyword(s):

Error Detection ◽

Character Recognition ◽

Optical Character Recognition ◽

Arabic Language ◽

Spelling Error ◽

Post Processing ◽

Optical Character ◽

Spelling Check Combined Language Models and Knowledge Resources for Printer Drivers

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.764-765.955 ◽

2015 ◽

Vol 764-765 ◽

pp. 955-959

Author(s):

Jui Feng Yeh ◽

Cheng Hsien Lee ◽

Yun Yun Lu ◽

Guan Huei Wu ◽

Yao Yi Wang

Keyword(s):

Error Detection ◽

Detection Rate ◽

Language Model ◽

Knowledge Bases ◽

Language Models ◽

Spelling Error ◽

Linguistic Features ◽

Knowledge Resources ◽

New Feature

This paper proposed a spelling error detection and correction using the linguistic features and knowledge resource. The linguistic features mainly come from language model that describes the probability of a sentence. In practice, the formal document with typos is defective and fall short of the specifications, since typos and error hidden in printed document are frequent, rework will cause the waste of paper and ink. This paper proposed an approach that addresses the spelling errors and before printing. In this method, the linguistic features are used in this research to compare and increase a new feature additionally that is a function of Internet search based on knowledge bases. Combining these research manners, this paper expect to achieve the goals of confirming, improving the detection rate of typos, and reducing the waste of resources. Experimental results shows, the proposed method is practicable and efficient for users to detect the typos in the printed documents.

Effects of frequency and morphosyntactic structure on error detection, correction, and repetition in Swedish-speaking children

Applied Psycholinguistics ◽

10.1017/s0142716418000280 ◽

2018 ◽

Vol 39 (6) ◽

pp. 1189-1220

Author(s):

ANNA EVA HALLIN ◽

CHRISTINA REUTERSKIÖLD

Keyword(s):

Language Development ◽

Language Processing ◽

Noun Phrase ◽

Error Detection ◽

Low Frequency ◽

Past Tense ◽

Error Type ◽

Indefinite Article ◽

Language Knowledge

ABSTRACTGrammatical error detection and correction are often used to test explicit language knowledge. This study investigated effects of token frequency and error type in error detection, correction, and repetition, and performance on the three tasks were compared and related to models of metalinguistic awareness and development. Thirty Swedish-speaking 10-year-olds with typical language development participated in the study, which focused on four morphosyntactic errors: the infinitive instead of past tense for regular and irregular verbs, and the omission of the obligatory indefinite article in common and neuter gender noun phrases. Target verbs and nouns were of high or low frequency. Results showed significant effects of verb frequency in all tasks, and effects of noun gender for error detection, but not for correction and repetition. Children detected significantly more past-tense errors than they accurately corrected, but the opposite result was seen for noun phrase errors. The patterns of results both within and across tasks imply that implicit language knowledge affects performance, and that lexical frequency, even of familiar words, needs to be controlled when designing tasks for measuring grammatical knowledge. The particular challenge of the Swedish neuter noun phrase in language development and language processing needs to be further investigated.

A statistical model for automatic Error Detection and Correction of Assamese Words

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3859.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 6111-6116

Keyword(s):

Language Processing ◽

Error Detection ◽

Mother Tongue ◽

Research Work ◽

Language Models ◽

Indian Languages ◽

Computational Point ◽

Proposed Model ◽

Assamese Language

Digitization of local languages is getting importance in the present scenario and the Language Processing task is also becoming popular among the Linguistic and IT people. It is very common that most of the people are comfortable with their native mother tongue. Writing of corrected word-form is also an important task in the digital platforms for the future existence of a language. In this research work, the Assamese language is taken as a Natural Language which is processed in the experiments. The Assamese language is one of the Indian languages and the research & development of the Assamese language is going on; from the computational point of view, Assamese is in the development phase. In Assamese, there are some similar characters which are phonetically same but their glyphs are different these characters or symbols often cause confusion to the users while writing, these types of characters are specially taken into consideration in this research work. A list of 14 confusing characters pairs of Assamese letters is taken for experimental purpose. In addition, this research work has focused on errors of Assamese words, which are checked by using bigram and trigram models. Moreover, the proposed model also tries to find the erroneous character which causes the incorrectness and shows the suggestions for that incorrect character. A score based system is designed for the Assamese characters and each character is assigned a score from their probability of occurrences by using bigram and trigram language models. Different types of experiments are performed to check the correctness of the Assamese words and the proposed model is able to check the correctness of the Assamese word with accuracy ranging from 81% to 86%. Error rate in Assamese can be reduced by using this model in any digital platform where a user can type in Assamese

Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2016.10003960 ◽

2016 ◽

Vol 8 (3/4) ◽

pp. 91 ◽

Cited By ~ 1

Author(s):

Ahmed M. Al Trad ◽

Iyad Abu Doush

Keyword(s):

Error Detection ◽

Character Recognition ◽

Optical Character Recognition ◽

Arabic Language ◽

Spelling Error ◽

Post Processing ◽

Optical Character ◽

Spell checker for consumer language (CSpell)

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy171 ◽

2019 ◽

Vol 26 (3) ◽

pp. 211-218 ◽

Cited By ~ 7

Author(s):

Chris J Lu ◽

Alan R Aronson ◽

Sonya E Shooshan ◽

Dina Demner-Fushman

Keyword(s):

Error Detection ◽

State Of The Art ◽

Word Boundary ◽

Spelling Error ◽

Consumer Health ◽

Correct Word ◽

Ranking System ◽

Novel Approach ◽

Spell Checker

Abstract Objective Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above. Methods We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions. Results Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively. Discussion The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system. Conclusion CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.