Building an Information Retrieval Apparatus for Kannada Language: A Rule-Based Machine Interpretation System with Noise Suppression

Author(s):  
Shivani Kulkarni ◽  
R. H. Goudar
2017 ◽  
Vol 8 (2) ◽  
pp. 26-42 ◽  
Author(s):  
Md. Majharul Haque ◽  
Suraiya Pervin ◽  
Zerina Begum

The object of this research work is to replace pronoun by corresponding noun for Bangla news documents. To the best of our knowledge, this is the first initiative to solve the problem of dangling pronoun where corresponding noun is not available. If the information retrieval procedures extract any sentence with dangling pronoun, it may raise confusion to the user. To mitigate this problem, a method has been proposed here by using general and special tagging, dependency parsing, full name identifying and finally pronoun replacing. For achieving the target of this method, 3000 Bangla news documents have been analyzed and some grammar books have been studied. Seven knowledgeable persons in the arena of Bangla language also helped us in this research work. Finally, the proposed method shows 71.80% accuracy in the evaluation for replacing pronoun.


Author(s):  
Jacob Collard ◽  
Talapady N. Bhat ◽  
John Elliott ◽  
Ram Sriram ◽  
Ira Monarch ◽  
...  

Author(s):  
Parul Kalra ◽  
Deepti Mehrotra ◽  
Abdul Wahid

The focus of this chapter is to design a cognitive information retrieval (CIR) framework using inference engine (IE). IE permits one to analyze the central concepts of information retrieval: information, information needs, and relevance. The aim is to propose an inference engine in which adequate user preferences are considered. As the cognitive inference engine (CIE) approach is involved, the complex inquiries are required to return more important outcomes as opposed to customary database questions which get irrelevant and unsolicited responses or results. The chapter highlights the framework of a cognitive rule-based engine in which preference queries are dealt with while keeping in mind the intention of the user, their performance, and optimization.


2018 ◽  
Vol 38 (1) ◽  
pp. 5
Author(s):  
Veena Makhija ◽  
Swapnil Ahuja

<p>The emergent concept of ‘ Big Data’ has shifted the paradigm from information retrieval to information extraction techniques. The information extraction techniques enables corpus analysis to draw useful interpretations and its possible applications. Selection of appropriate information extraction technique depends upon the type of data being dealt with and its possible applications. In an R&amp;D environment, the published information is considered as an authenticated benchmark to study and analyse the growth pattern in that field of science, medicine, business. A rule based information extraction process, on the selected data extracted from a bibliographic database of published R&amp;D papers is proposed in this paper. Aim of the study is to build up a database on relevant concepts, cleaning of retrieved data and automate the process of information retrieval in the local database. For this purpose, a concept based ‘subject profiles’ in the area of advanced semiconductors as well as the rules for text extraction from metadata retrieved from the bibliographic database was developed. This subset was used as an input to the knowledge domain to support R&amp;D in the area of ‘advanced semiconductor materials and devices’ and provide information services on Intranet. Study found that concept based pattern matching on the datasets downloaded yielded better results as compared to the results by using the controlled vocabulary of the source database .</p>


2015 ◽  
Vol 6 (4) ◽  
Author(s):  
Bonifacius Vicky Indriyono ◽  
Ema Utami ◽  
Andi Sunyoto

Abstract. Stemming is the process of mapping and decomposition of various forms (variants) of a word to essentially find the root word. This process is also referred to as the conflation. Stemming process has been widely used in the activities of the information retrieval (search information) to improve the quality of the information obtained. Stemming works by employing words taken froma dictionary and the usage of the basic rules of affixes. Porter stemmer for Indonesian or commonly referred as Tala stemmer uses the rules of basic analysis to find the root of a word. Tala Stemmer does not use a dictionary in the process. Instead, it uses a rule-based algorithm. In this study, the principal issue raised is how to make the process of classification/determination of the book/library materials in a library with a fast and effective manner in order to minimize error in determining the type of books. The solution is to utilize the method used by the porter stemmer for stemming Indonesian.Keywords: Stemming, Information Retrieval, Porter Stemmer, Classification Abstrak. Stemming adalah proses pemetaan dan penguraian berbagai bentuk (variants) dari suatu kata menjadi bentuk kata dasarnya. Proses ini juga disebut sebagai conflation. Proses stemming secara luas sudah digunakan di dalam kegiatan Information retrieval (pencarian informasi) untuk meningkatkan kualitas informasi yang didapatkan. Cara kerja stemming dapat dilakukan dengan menggunakan kamus kata dasar maupun menggunakan aturan-aturan imbuhan. Porter stemmer untuk Bahasa Indonesia atau yang biasa disebut dengan stemmer Tala menggunakan rule base analisis untuk mencari root sebuah kata. Stemmer Tala tidak menggunakan kamus dalam proses, melainkan menggunakan algoritma berbasis aturan. Dalam penelitian ini, pokok permasalahan yang diangkat adalah bagaimana melakukan proses klasifikasi/penentuan jenis buku/bahan pustaka dalam sebuah perpustakaan dengan cara yang cepat dan efektif sehingga dapat meminimalisir kesalahan penentuan jenis buku. Solusi yang dipergunakan adalah dengan memanfaatkan metode stemming dengan porter stemmer untuk bahasa Indonesia.Kata Kunci: Stemming, Information Retrieval, Porter Stemmer, Klasifikasi


Sign in / Sign up

Export Citation Format

Share Document