scholarly journals Granska API – an Online API for Grammar Checking and Other NLP Services

2021 ◽  
Author(s):  
Jonas Sjöbergh ◽  
Viggo Kann

We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.

2015 ◽  
Vol 22 (5) ◽  
pp. 751-773 ◽  
Author(s):  
MOHAMMED ATTIA ◽  
PAVEL PECINA ◽  
YOUNES SAMIH ◽  
KHALED SHAALAN ◽  
JOSEF VAN GENABITH

AbstractA spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.


2015 ◽  
Vol 764-765 ◽  
pp. 955-959
Author(s):  
Jui Feng Yeh ◽  
Cheng Hsien Lee ◽  
Yun Yun Lu ◽  
Guan Huei Wu ◽  
Yao Yi Wang

This paper proposed a spelling error detection and correction using the linguistic features and knowledge resource. The linguistic features mainly come from language model that describes the probability of a sentence. In practice, the formal document with typos is defective and fall short of the specifications, since typos and error hidden in printed document are frequent, rework will cause the waste of paper and ink. This paper proposed an approach that addresses the spelling errors and before printing. In this method, the linguistic features are used in this research to compare and increase a new feature additionally that is a function of Internet search based on knowledge bases. Combining these research manners, this paper expect to achieve the goals of confirming, improving the detection rate of typos, and reducing the waste of resources. Experimental results shows, the proposed method is practicable and efficient for users to detect the typos in the printed documents.


2018 ◽  
Vol 39 (6) ◽  
pp. 1189-1220
Author(s):  
ANNA EVA HALLIN ◽  
CHRISTINA REUTERSKIÖLD

ABSTRACTGrammatical error detection and correction are often used to test explicit language knowledge. This study investigated effects of token frequency and error type in error detection, correction, and repetition, and performance on the three tasks were compared and related to models of metalinguistic awareness and development. Thirty Swedish-speaking 10-year-olds with typical language development participated in the study, which focused on four morphosyntactic errors: the infinitive instead of past tense for regular and irregular verbs, and the omission of the obligatory indefinite article in common and neuter gender noun phrases. Target verbs and nouns were of high or low frequency. Results showed significant effects of verb frequency in all tasks, and effects of noun gender for error detection, but not for correction and repetition. Children detected significantly more past-tense errors than they accurately corrected, but the opposite result was seen for noun phrase errors. The patterns of results both within and across tasks imply that implicit language knowledge affects performance, and that lexical frequency, even of familiar words, needs to be controlled when designing tasks for measuring grammatical knowledge. The particular challenge of the Swedish neuter noun phrase in language development and language processing needs to be further investigated.


2019 ◽  
Vol 8 (2) ◽  
pp. 6111-6116

Digitization of local languages is getting importance in the present scenario and the Language Processing task is also becoming popular among the Linguistic and IT people. It is very common that most of the people are comfortable with their native mother tongue. Writing of corrected word-form is also an important task in the digital platforms for the future existence of a language. In this research work, the Assamese language is taken as a Natural Language which is processed in the experiments. The Assamese language is one of the Indian languages and the research & development of the Assamese language is going on; from the computational point of view, Assamese is in the development phase. In Assamese, there are some similar characters which are phonetically same but their glyphs are different these characters or symbols often cause confusion to the users while writing, these types of characters are specially taken into consideration in this research work. A list of 14 confusing characters pairs of Assamese letters is taken for experimental purpose. In addition, this research work has focused on errors of Assamese words, which are checked by using bigram and trigram models. Moreover, the proposed model also tries to find the erroneous character which causes the incorrectness and shows the suggestions for that incorrect character. A score based system is designed for the Assamese characters and each character is assigned a score from their probability of occurrences by using bigram and trigram language models. Different types of experiments are performed to check the correctness of the Assamese words and the proposed model is able to check the correctness of the Assamese word with accuracy ranging from 81% to 86%. Error rate in Assamese can be reduced by using this model in any digital platform where a user can type in Assamese


2019 ◽  
Vol 26 (3) ◽  
pp. 211-218 ◽  
Author(s):  
Chris J Lu ◽  
Alan R Aronson ◽  
Sonya E Shooshan ◽  
Dina Demner-Fushman

Abstract Objective Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above. Methods We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions. Results Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively. Discussion The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system. Conclusion CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https://umlslex.nlm.nih.gov/cSpell.


Sign in / Sign up

Export Citation Format

Share Document