ParsiPardaz: Persian Language Processing Toolkit

A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.

Download Full-text

HOMPer: A new hybrid system for opinion mining in the Persian language

Journal of Information Science ◽

10.1177/0165551519827886 ◽

2019 ◽

Vol 46 (1) ◽

pp. 101-117 ◽

Cited By ~ 3

Author(s):

Mohammad Ehsan Basiri ◽

Arman Kabiri

Keyword(s):

Machine Learning ◽

Language Processing ◽

Opinion Mining ◽

Feature Selection Method ◽

Large Data ◽

Data Set ◽

Persian Language ◽

Rating Prediction ◽

Bayes Algorithm ◽

Component Feature

Opinion mining is a subfield of data mining and natural language processing that concerns with extracting users’ opinion and attitude towards products or services from their comments on the Web. Persian opinion mining, in contrast to its counterpart in English, is a totally new field of study and hence, it has not received the attention it deserves. Existing methods for opinion mining in the Persian language may be classified into machine learning– and lexicon-based approaches. These methods have been proposed and successfully used for polarity-detection problem. However, when they should be used for more complex tasks like rating prediction, their results are not desirable. In this study, first an exhaustive investigation of machine learning– and lexicon-based methods is performed. Then, a new hybrid method is proposed for rating-prediction problem in the Persian language. Finally, the effect of machine learning component, feature-selection method, normalisation method and combination level are investigated. The experimental results on a large data set containing 16,000 Persian customers’ review show that this proposed system achieves higher performance in comparison to Naïve Bayes algorithm and a pure lexicon-based method. Moreover, results demonstrate that this proposed method may also be successfully used for polarity detection.

Download Full-text

Persian sentences to phoneme sequences conversion based on recurrent neural networks

Open Computer Science ◽

10.1515/comp-2016-0019 ◽

2016 ◽

Vol 6 (1) ◽

pp. 219-225 ◽

Cited By ~ 1

Author(s):

Yasser Mohseni Behbahani ◽

Bagher Babaali ◽

Mussa Turdalyuly

Keyword(s):

Neural Networks ◽

Language Processing ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Contextual Information ◽

Short Term ◽

Persian Language ◽

Short Vowels ◽

Sequential Labeling ◽

Long Short Term Memory

AbstractGrapheme to phoneme conversion is one of the main subsystems of Text-to-Speech (TTS) systems. Converting sequence of written words to their corresponding phoneme sequences for the Persian language is more challenging than other languages; because in the standard orthography of this language the short vowels are omitted and the pronunciation ofwords depends on their positions in a sentence. Common approaches used in the Persian commercial TTS systems have several modules and complicated models for natural language processing and homograph disambiguation that make the implementation harder as well as reducing the overall precision of system. In this paper we define the grapheme-to-phoneme conversion as a sequential labeling problem; and use the modified Recurrent Neural Networks (RNN) to create a smart and integrated model for this purpose. The recurrent networks are modified to be bidirectional and equipped with Long-Short Term Memory (LSTM) blocks to acquire most of the past and future contextual information for decision making. The experiments conducted in this paper show that in addition to having a unified structure the bidirectional RNN-LSTM has a good performance in recognizing the pronunciation of the Persian sentences with the precision more than 98 percent.

Download Full-text

Informal-to-Formal Word Conversion for Persian Language Using Natural Language Processing Techniques

2021 2nd International Conference on Computing, Networks and Internet of Things (CNIOT 2021) ◽

10.1145/3468691.3468710 ◽

2021 ◽

Author(s):

Amin Naemi ◽

Marjan Mansourvar ◽

Mostafa Naemi ◽

Bahman Damirchilu ◽

Ali Ebrahimi ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Persian Language ◽

Processing Techniques

Download Full-text

Developing an effective scheme for translation and expansion of Persian user queries

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz041 ◽

2019 ◽

Author(s):

Razieh Esmailpour ◽

Saeideh Ebrahimy ◽

Seyed Mostafa Fakhrahmad ◽

Mehdi Mohammadi ◽

Javad Abbaspour

Keyword(s):

Language Processing ◽

Google Scholar ◽

Journal Articles ◽

Query Translation ◽

Bibliographic Information ◽

Persian Language ◽

Language Needs ◽

Processing Techniques ◽

First Time ◽

User Queries

Abstract This study aims at introducing a new source for translation and expansion of user queries in Persian language in order to develop a bilingual dictionary. For the purpose of this study, required data were extracted and processed from English and Persian bibliographic information of journal articles to develop a dictionary for query translation and expansion, denoted as Query Expansion Assistant Database (QEAD). In this study, psychology and educational sciences journals have been selected as the sample with the potential of extension to other domains. Persian–English authors’ keywords were used for translation part and titles of English references were used to extract phrases using natural language processing techniques for the expansion part. The proposed algorithm is demonstrated. Then we evaluated this approach using human evaluation by using Google translate (GT) and Google scholar. Although the evaluation of translation part indicated 60% match between GT and QEAD, in 40% of unmatched translations, QEAD showed a better performance according to expert evaluators. Expansion part of QEAD was compared with Google scholar suggestions, which indicated that the expanded words of QEAD can equalize with Google scholar suggestions. Persian as a low resource language needs more qualified lexicon translation. In addition, using the English–Persian bibliographic information of scientific journals to mine lexicon translation is conducted for the first time. Since these journals are peer-reviewed, they can be a valuable source for translation of user’s query. Users can be informed of the most prevalent and up-to-date words or phrases among scientists, because journals are published frequently.

Download Full-text

Comprehension of Complex Structures by Persian-speaking Aphasics: The Role of Cognitive Load

Journal of Modern Rehabilitation ◽

10.18502/jmr.v15i4.7743 ◽

2021 ◽

Author(s):

Omid Azad

Keyword(s):

Cognitive Load ◽

Language Processing ◽

Judgment Task ◽

Complex Structures ◽

Broca's Aphasia ◽

Persian Language ◽

Broca’S Aphasia ◽

Grammatical Judgment ◽

The Impact ◽

Object Relative

Introduction: So far, many studies have investigated the extent and nature of the grammatical deficit in aphasia. However, to the best of our knowledge, this research is the first in the Persian language to inspect the comprehension of patients with Broca’s aphasia on diverse syntactically complex structures. Materials and Methods: To scrutinize the impact of task on aphasics’ performance, four age-, education- and gender-matched Persian-speaking patients with Broca’s aphasia were compared with their healthy matched controls regarding the two different tasks of grammatical judgment and figurine act-out task. The structures used to examine the subjects’ performance included agentive passive, subject cleft, object cleft, object relative clause, and object experiencer psychological verbs. Results: Our results which supported the trade-off hypothesis, showed that our subjects generally performed better in grammatical judgment task than in figurine act-out task (P≤0.05). Particularly in the second task, as our inner task comparison, the patients’ problems were more severe in object cleft, object experiencer, and object relative clauses: all structures whose interpretations need more cognitive load. Conclusion: Our findings put more weight on the interactive or constraint-based model of language processing.

Download Full-text

PERLEX: A Bilingual Persian-English Gold Dataset for Relation Extraction

Scientific Programming ◽

10.1155/2021/8893270 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Majid Asgari-Bidhendi ◽

Mehrdad Nasser ◽

Behrooz Janfada ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

Relation Extraction ◽

Neural Model ◽

Semantic Relations ◽

Base Population ◽

Neural Models ◽

Persian Language ◽

Knowledge Base Population

Relation extraction is the task of extracting semantic relations between entities in a sentence. It is an essential part of some natural language processing tasks such as information extraction, knowledge extraction, question answering, and knowledge base population. The main motivations of this research stem from a lack of a dataset for relation extraction in the Persian language as well as the necessity of extracting knowledge from the growing big data in the Persian language for different applications. In this paper, we present “PERLEX” as the first Persian dataset for relation extraction, which is an expert-translated version of the “SemEval-2010-Task-8” dataset. Moreover, this paper addresses Persian relation extraction utilizing state-of-the-art language-agnostic algorithms. We employ six different models for relation extraction on the proposed bilingual dataset, including a non-neural model (as the baseline), three neural models, and two deep learning models fed by multilingual BERT contextual word representations. The experiments result in the maximum F1-score of 77.66% (provided by BERTEM-MTB method) as the state of the art of relation extraction in the Persian language.

Download Full-text

A Comparative Study on the Impact of Part-of-Speech Tagging on Parsing for the Persian Language Processing

Signal and Data Processing ◽

10.18869/acadpub.jsdp.13.4.121 ◽

2017 ◽

Vol 13 (4) ◽

pp. 121-132

Author(s):

Masood Ghayoomi ◽

Keyword(s):

Comparative Study ◽

Language Processing ◽

Part Of Speech Tagging ◽

Persian Language ◽

Part Of Speech ◽

The Impact ◽

Speech Tagging

Download Full-text

Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD

Journal of Artificial Intelligence Research ◽

10.1613/jair.4968 ◽

2016 ◽

Vol 56 ◽

pp. 61-87 ◽

Cited By ~ 5

Author(s):

Nasrin Taghizadeh ◽

Hesham Faili

Keyword(s):

Language Processing ◽

Semantic Processing ◽

Large Scale ◽

Word Sense Disambiguation ◽

Expectation Maximization Algorithm ◽

Word Sense ◽

Low Resource ◽

Persian Language ◽

Sense Disambiguation ◽

Cross Lingual

‎Wordnets are an effective resource for natural language processing and information retrieval‎, ‎especially for semantic processing and meaning related tasks‎. ‎So far‎, ‎wordnets have been constructed for many languages‎. ‎However‎, ‎the automatic development of wordnets for low-resource languages has not been well studied‎. ‎In this paper‎, ‎an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poor-resource languages‎. ‎The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bi-lingual dictionary and a mono-lingual corpus‎. ‎The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments‎. ‎The results show that the induced wordnet has a precision score of 90% and a recall score of 35%‎.

Download Full-text

Language processing is not a race against time

Behavioral and Brain Sciences ◽

10.1017/s0140525x15000692 ◽

2016 ◽

Vol 39 ◽

Cited By ~ 1

Author(s):

Giosuè Baggio ◽

Carmelo M. Vicario

Keyword(s):

Information Processing ◽

Human Brain ◽

Language Processing ◽

Cognitive Functions ◽

Memory Systems ◽

Language System ◽

Moderating Factors

AbstractWe agree with Christiansen & Chater (C&C) that language processing and acquisition are tightly constrained by the limits of sensory and memory systems. However, the human brain supports a range of cognitive functions that mitigate the effects of information processing bottlenecks. The language system is partly organised around these moderating factors, not just around restrictions on storage and computation.

Download Full-text