scholarly journals Rule Based Shallow Parser for Arabic Language

2011 ◽  
Vol 7 (10) ◽  
pp. 1505-1514 ◽  
Author(s):  
Mohammed
2021 ◽  
Vol 3 (32) ◽  
pp. 05-35
Author(s):  
Hashem Alsharif ◽  

There exist no corpora of Arabic nouns. Furthermore, in any Arabic text, nouns can be found in different forms. In fact, by tagging nouns in an Arabic text, the beginning of each sentence can determine whether it starts with a noun or a verb. Part of Speech Tagging (POS) is the task of labeling each word in a sentence with its appropriate category, which is called a Tag (Noun, Verb and Article). In this thesis, we attempt to tag non-vocalized Arabic text. The proposed POS Tagger for Arabic Text is based on searching for each word of the text in our lists of Verbs and Articles. Nouns are found by eliminating Verbs and Articles. Our hypothesis states that, if the word in the text is not found in our lists, then it is a Noun. These comparisons will be made for each of the words in the text until all of them have been tagged. To apply our method, we have prepared a list of articles and verbs in the Arabic language with a total of 112 million verbs and articles combined, which are used in our comparisons to prove our hypothesis. To evaluate our proposed method, we used pre-tagged words from "The Quranic Arabic Corpus", making a total of 78,245 words, with our method, the Template-based tagging approach compared with (AraMorph) a rule-based tagging approach and the Stanford Log-linear Part-Of-Speech Tagger. Finally, AraMorph produced 40% correctly-tagged words and Stanford Log-linear Part-Of-Speech Tagger produced 68% correctly-tagged words, while our method produced 68,501 correctly-tagged words (88%).


2017 ◽  
Vol 2 (3) ◽  
pp. 111-115
Author(s):  
Soufiane Farrah ◽  
Hanane El Manssouri ◽  
Ziyati Elhoussaine ◽  
Mohamed Ouzzif

Author(s):  
Hichem Rahab ◽  
Mahieddine Djoudi ◽  
Abdelhafid Zitouni

Today, it is usual that a consumer seeks for others' feelings about their purchasing experience on the web before a simple decision of buying a product or a service. Sentiment analysis intends to help people in taking profit from the available opinionated texts on the web for their decision making, and business is one of its challenging areas. Considerable work of sentiment analysis has been achieved in English and other Indo-European languages. Despite the important number of Arabic speakers and internet users, studies in Arabic sentiment analysis are still insufficient. The current chapter vocation is to give the main challenges of Arabic sentiment together with their recent proposed solutions in the literature. The chapter flowchart is presented in a novel manner that obtains the main challenges from presented literature works. Then it gives the proposed solutions for each challenge. The chapter reaches the finding that the future tendency will be toward rule-based techniques and deep learning, allowing for more dealings with Arabic language inherent characteristics.


Author(s):  
Nisrean Thalji ◽  
Nik Adilah ◽  
Walid Bani ◽  
Sohair Al-Hakeem ◽  
Zyad Thalji

Author(s):  
Toufik Sari ◽  
Mokhtar Sellami

International audience In this paper, we present two methods for correcting Arabic words generated by text and/or speech recognizers. These techniques operate as post-processors and they are conceived to be adaptable. They correct rejection and substitution word errors. The former one is very linked to the dictionary and is called 'lexicon driven', when the orther is very general exploiting contextual information and called 'context driven'. Arabic language properties are very useful in morpho-lexical analysis and so they were strongly exploited in the development of the second method. Substitution errors are rewritten in rules for being used by a rule based system. The extensions to the other levels of language analysis are considered in perspectives. Nous proposons dans cet article deux méthodes universelles de post-traitement pour la correction des mots arabes issus des systèmes de reconnaissance de textes et de parole arabes. Elles sont conçues à être adaptables. Ces approches corrigent les erreurs de type rejet et substitution. L'une d'elles est étroitement liée au dictionnaire elle est dite guidée par le lexique, l'autre, guidée par le contexte, est plus générale exploitant les information contextuelles. Les propriétés de la langue arabe sont très utiles en analyse morpho-lexicale et par conséquent elles sont fortement exploitées dans le développement de la deuxième méthode. Les erreurs de substitution sont réécrites sous formes de règles de production et utilisées par un système de production. Les extensions aux autres niveaux du traitement du langage sont envisagées en perspectives.


2021 ◽  
Vol 297 ◽  
pp. 01058
Author(s):  
Anoual El Kah ◽  
Imad Zeroual

Arabic topic identification is a part of text classification that aims to assign a given text a set of pre-defined classes (i.e., topics) based on its content and extracted features. This task can be performed using rule-based methods or data-driven approaches. These latter gained more popularity since they require much less human effort to accurately classify a large number of documents. Due to the tremendous growth of Web contents primarily in news websites and social media, topic identification had received a great deal of attention over the last years, and has become a cornerstone of both search engines and information retrieval. The Arabic language is the fourth most used language on the web and records the highest growth in the last two decades (2000–2020). Based on these facts currently available, it seems fair to look closer at the advancements in the Arabic topic identification in the last decade. To this end, we performed the first of its kind scoping review that addresses recent studies in the field of Arabic topic identification that follows the PRISMA-ScR guidelines. This review is based on various online bibliographic databases (e.g., Springer, ScienceDirect, and IEEE Xplore) and datasets search engines (e.g., Google Dataset Search).


1992 ◽  
Vol 23 (1) ◽  
pp. 52-60 ◽  
Author(s):  
Pamela G. Garn-Nunn ◽  
Vicki Martin

This study explored whether or not standard administration and scoring of conventional articulation tests accurately identified children as phonologically disordered and whether or not information from these tests established severity level and programming needs. Results of standard scoring procedures from the Assessment of Phonological Processes-Revised, the Goldman-Fristoe Test of Articulation, the Photo Articulation Test, and the Weiss Comprehensive Articulation Test were compared for 20 phonologically impaired children. All tests identified the children as phonologically delayed/disordered, but the conventional tests failed to clearly and consistently differentiate varying severity levels. Conventional test results also showed limitations in error sensitivity, ease of computation for scoring procedures, and implications for remediation programming. The use of some type of rule-based analysis for phonologically impaired children is highly recommended.


2020 ◽  
Vol 63 (10) ◽  
pp. 3472-3487
Author(s):  
Natalia V. Rakhlin ◽  
Nan Li ◽  
Abdullah Aljughaiman ◽  
Elena L. Grigorenko

Purpose We examined indices of narrative microstructure as metrics of language development and impairment in Arabic-speaking children. We examined their age sensitivity, correlations with standardized measures, and ability to differentiate children with average language and language impairment. Method We collected story narratives from 177 children (54.2% boys) between 3.08 and 10.92 years old ( M = 6.25, SD = 1.67) divided into six age bands. Each child also received standardized measures of spoken language (Receptive and Expressive Vocabulary, Sentence Imitation, and Pseudoword Repetition). Several narrative indices of microstructure were examined in each age band. Children were divided into (suspected) developmental language disorder and typical language groups using the standardized test scores and compared on the narrative indicators. Sensitivity and specificity of the narrative indicators that showed group differences were calculated. Results The measures that showed age sensitivity included subject omission error rate, number of object clitics, correct use of subject–verb agreement, and mean length of utterance in words. The developmental language disorder group scored higher on subject omission errors (Cohen's d = 0.55) and lower on correct use of subject–verb agreement (Cohen's d = 0.48) than the typical language group. The threshold for impaired performance with the highest combination of specificity and sensitivity was 35th percentile. Conclusions Several indices of narrative microstructure appear to be valid metrics for documenting language development in children acquiring Gulf Arabic. Subject omission errors and correct use of subject–verb agreement differentiate children with typical and atypical levels of language development.


Sign in / Sign up

Export Citation Format

Share Document