ParsiPardaz: Persian Language Processing Toolkit

Author(s):  
Zahra Sarabi ◽  
Hooman Mahyar ◽  
Mojgan Farhoodi
2020 ◽  
Vol 29 (06) ◽  
pp. 2050019
Author(s):  
Hadi Veisi ◽  
Hamed Fakour Shandi

A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.


2019 ◽  
Vol 46 (1) ◽  
pp. 101-117 ◽  
Author(s):  
Mohammad Ehsan Basiri ◽  
Arman Kabiri

Opinion mining is a subfield of data mining and natural language processing that concerns with extracting users’ opinion and attitude towards products or services from their comments on the Web. Persian opinion mining, in contrast to its counterpart in English, is a totally new field of study and hence, it has not received the attention it deserves. Existing methods for opinion mining in the Persian language may be classified into machine learning– and lexicon-based approaches. These methods have been proposed and successfully used for polarity-detection problem. However, when they should be used for more complex tasks like rating prediction, their results are not desirable. In this study, first an exhaustive investigation of machine learning– and lexicon-based methods is performed. Then, a new hybrid method is proposed for rating-prediction problem in the Persian language. Finally, the effect of machine learning component, feature-selection method, normalisation method and combination level are investigated. The experimental results on a large data set containing 16,000 Persian customers’ review show that this proposed system achieves higher performance in comparison to Naïve Bayes algorithm and a pure lexicon-based method. Moreover, results demonstrate that this proposed method may also be successfully used for polarity detection.


2016 ◽  
Vol 6 (1) ◽  
pp. 219-225 ◽  
Author(s):  
Yasser Mohseni Behbahani ◽  
Bagher Babaali ◽  
Mussa Turdalyuly

AbstractGrapheme to phoneme conversion is one of the main subsystems of Text-to-Speech (TTS) systems. Converting sequence of written words to their corresponding phoneme sequences for the Persian language is more challenging than other languages; because in the standard orthography of this language the short vowels are omitted and the pronunciation ofwords depends on their positions in a sentence. Common approaches used in the Persian commercial TTS systems have several modules and complicated models for natural language processing and homograph disambiguation that make the implementation harder as well as reducing the overall precision of system. In this paper we define the grapheme-to-phoneme conversion as a sequential labeling problem; and use the modified Recurrent Neural Networks (RNN) to create a smart and integrated model for this purpose. The recurrent networks are modified to be bidirectional and equipped with Long-Short Term Memory (LSTM) blocks to acquire most of the past and future contextual information for decision making. The experiments conducted in this paper show that in addition to having a unified structure the bidirectional RNN-LSTM has a good performance in recognizing the pronunciation of the Persian sentences with the precision more than 98 percent.


Author(s):  
Razieh Esmailpour ◽  
Saeideh Ebrahimy ◽  
Seyed Mostafa Fakhrahmad ◽  
Mehdi Mohammadi ◽  
Javad Abbaspour

Abstract This study aims at introducing a new source for translation and expansion of user queries in Persian language in order to develop a bilingual dictionary. For the purpose of this study, required data were extracted and processed from English and Persian bibliographic information of journal articles to develop a dictionary for query translation and expansion, denoted as Query Expansion Assistant Database (QEAD). In this study, psychology and educational sciences journals have been selected as the sample with the potential of extension to other domains. Persian–English authors’ keywords were used for translation part and titles of English references were used to extract phrases using natural language processing techniques for the expansion part. The proposed algorithm is demonstrated. Then we evaluated this approach using human evaluation by using Google translate (GT) and Google scholar. Although the evaluation of translation part indicated 60% match between GT and QEAD, in 40% of unmatched translations, QEAD showed a better performance according to expert evaluators. Expansion part of QEAD was compared with Google scholar suggestions, which indicated that the expanded words of QEAD can equalize with Google scholar suggestions. Persian as a low resource language needs more qualified lexicon translation. In addition, using the English–Persian bibliographic information of scientific journals to mine lexicon translation is conducted for the first time. Since these journals are peer-reviewed, they can be a valuable source for translation of user’s query. Users can be informed of the most prevalent and up-to-date words or phrases among scientists, because journals are published frequently.


Author(s):  
Omid Azad

Introduction: So far, many studies have investigated the extent and nature of the grammatical deficit in aphasia. However, to the best of our knowledge, this research is the first in the Persian language to inspect the comprehension of patients with Broca’s aphasia on diverse syntactically complex structures. Materials and Methods: To scrutinize the impact of task on aphasics’ performance, four age-, education- and gender-matched Persian-speaking patients with Broca’s aphasia were compared with their healthy matched controls regarding the two different tasks of grammatical judgment and figurine act-out task. The structures used to examine the subjects’ performance included agentive passive, subject cleft, object cleft, object relative clause, and object experiencer psychological verbs. Results: Our results which supported the trade-off hypothesis, showed that our subjects generally performed better in grammatical judgment task than in figurine act-out task (P≤0.05). Particularly in the second task, as our inner task comparison, the patients’ problems were more severe in object cleft, object experiencer, and object relative clauses: all structures whose interpretations need more cognitive load. Conclusion: Our findings put more weight on the interactive or constraint-based model of language processing.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Majid Asgari-Bidhendi ◽  
Mehrdad Nasser ◽  
Behrooz Janfada ◽  
Behrouz Minaei-Bidgoli

Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, question answering, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big data in the Persian language for different applications. In this paper, we present “PERLEX” as the first Persian dataset for relation extraction, which is an expert-translated version of the “SemEval-2010-Task-8” dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual BERT contextual word representations. The experiments result in the maximum F1-score of 77.66% (provided by BERTEM-MTB method) as the state of the art of relation extraction in the Persian language.


2016 ◽  
Vol 56 ◽  
pp. 61-87 ◽  
Author(s):  
Nasrin Taghizadeh ◽  
Hesham Faili

‎Wordnets are an effective resource for natural language processing and information retrieval‎, ‎especially for semantic processing and meaning related tasks‎. ‎So far‎, ‎wordnets have been constructed for many languages‎. ‎However‎, ‎the automatic development of wordnets for low-resource languages has not been well studied‎. ‎In this paper‎, ‎an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poor-resource languages‎. ‎The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bi-lingual dictionary and a mono-lingual corpus‎. ‎The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments‎. ‎The results show that the induced wordnet has a precision score of 90% and a recall score of 35%‎.


2016 ◽  
Vol 39 ◽  
Author(s):  
Giosuè Baggio ◽  
Carmelo M. Vicario

AbstractWe agree with Christiansen & Chater (C&C) that language processing and acquisition are tightly constrained by the limits of sensory and memory systems. However, the human brain supports a range of cognitive functions that mitigate the effects of information processing bottlenecks. The language system is partly organised around these moderating factors, not just around restrictions on storage and computation.


Sign in / Sign up

Export Citation Format

Share Document