Selecting effective index terms 
using a decision tree

This paper explores the effectiveness of index terms more complex than the single words used in conventional information retrieval systems. Retrieval is done in two phases: in the first, a conventional retrieval method (the Okapi system) is used; in the second, complex index terms such as syntactic relations and single words with part-of-speech information are introduced to rerank the results of the first phase. We evaluated the effectiveness of the different types of index terms through experiments using the TREC-7 test collection and 50 queries. The retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we then introduce a method to select effective index terms by using a decision tree. Further experiments with the same test collection showed that retrieval effectiveness was improved in 25 of the 50 queries.

Download Full-text

Rancang Bangun Aplikasi Sistem Pakar untuk Menangani Masalah Kecantikan pada Wajah Menggunakan Metode Decision Tree

Jurnal ULTIMATICS ◽

10.31937/ti.v6i1.326 ◽

2014 ◽

Vol 6 (1) ◽

pp. 9-14

Author(s):

Stefanie Sirapanji ◽

Seng Hansun

Keyword(s):

Decision Tree ◽

Uneven Distribution ◽

Dry Skin ◽

Average Accuracy ◽

Accuracy Level ◽

Index Terms ◽

Decision Tree Method ◽

Tree Method

Beauty is a precious asset for everyone. Everyone wants to have a healthy face. Unfortunately, there are always those problems that pops out on its own. For example, acnes, freckles, wrinkles, dull, oily and dry skin. Therefore, nowadays, there are a lot of beauty clinics available to help those who wants to solve their beauty troubles. But, not everyone can enjoy the facilities of those beauty clinics, for example those in the suburbs. The uneven distribution of doctors and the expensive cost of treatments are some of the reasons. In this research, the system that could help the patients to find the solution of their beauty problems is built. The decision tree method is used to take decision based on the shown schematic. Based on the system’s experiment, the average accuracy level hits 100%. Index Terms–Acnes, Decision Tree, Dry Skin, Dull, Facial Problems, Freckles, Wrinkles, Oily Skin, Eexpert System.

Download Full-text

On Generalized Vector Space Model in Information Retrieval

Fundamenta Informaticae ◽

10.3233/fi-1985-8207 ◽

1985 ◽

Vol 8 (2) ◽

pp. 253-267

Author(s):

S.K.M. Wong ◽

Wojciech Ziarko

Keyword(s):

Information Retrieval ◽

Vector Space ◽

A Priori ◽

Vector Space Model ◽

Smart System ◽

Space Model ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Index Terms ◽

Minimal Modification

In information retrieval, it is common to model index terms and documents as vectors in a suitably defined vector space. The main difficulty with this approach is that the explicit representation of term vectors is not known a priori. For this reason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectors. In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems.

Download Full-text

Constructions of the Cardinal Numerals with the Words Which Belong to Them in the Kievo-Pecherskiy Paterikon

Vestnik NSU Series History and Philology ◽

10.25205/1818-7919-2021-20-2-20-31 ◽

2021 ◽

Vol 20 (2) ◽

pp. 20-31

Author(s):

István Pozsgai

Keyword(s):

The Other ◽

13Th Century ◽

Old Russian ◽

Part Of Speech ◽

Slavonic Language ◽

Russian Edition ◽

Syntactic Relations ◽

Syntactic Properties

The aim of this work is to examine the system of the syntactic relations of the cardinal numerals with the words which belong to them in the Kievo-Pecherskiy Paterikon that was compiled in the 13th century. The manuscript on the basis of which the text was published was copied in the late 15th – early 16th centuries. I am mainly searching those phenomena, which can give information about the conditions of the genesis and development of the numerals as a new independent part of speech. I am paying attention to the phenomena which can be connected with the unification of the several types of the syntactic relations of the cardinal numerals with their associated words. I am searching for all quantitative constructions except for the constructions containing the numeral 1 as a prime numeral. The found quantitative constructions are grouped according to the type of combination of cardinal numerals with names or participles. Particular attention is paid to the combinations of quantitative numerals with related words, which differ from the norms of other monuments, such as the Old Church Slavonic language of the Russian edition, Old Russian and early Russian Church Slavonic monuments, since it is these phenomena that can indicate the process of acquiring general morphological and syntactic properties by cardinal numerals. On the basis of the quantitative constructions that do not correspond to the above-mentioned norms, three important grammatical phenomena are distinguished that can indicate the process of replacing old norms with new ones. As a contrast I am showing data from the other manuscripts.

Download Full-text

Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems

2007 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2007.370697 ◽

2007 ◽

Cited By ~ 8

Author(s):

Kyumars Sheykh Esmaili ◽

Hassan Abolhassani ◽

Mahmood Neshati ◽

Ehsan Behrangi ◽

Asreen Rostami ◽

...

Keyword(s):

Information Retrieval ◽

Test Collection ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Automatic Keyword Annotation System Using Newspapers

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0340 ◽

2014 ◽

Vol 18 (3) ◽

pp. 340-346 ◽

Cited By ~ 1

Author(s):

Tomoki Takada ◽

◽

Mizuki Arai ◽

Tomohiro Takagi

Keyword(s):

Information Retrieval ◽

Language Processing ◽

High Speed ◽

Naive Bayes ◽

High Accuracy ◽

Naïve Bayes ◽

Annotation System ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Index Terms

Nowadays, an increasingly large amount of information exists on the web. Therefore, a method is needed that enables us to find necessary information quickly because this is becoming increasingly difficult for users. To solve this problem, information retrieval systems like Google and recommendation systems like that on Amazon are used. In this paper, we focus on information retrieval systems. These retrieval systems require index terms, which affect the precision of retrieval. Two methods generally decide index terms. One is analyzing a text using natural language processing and deciding index terms using varying amounts of statistics. The other is someone choosing document keywords as index terms. However, the latter method requires too much time and effort and becomes more impractical as information grows. Therefore, we propose the Nikkei annotator system, which is based on the model of the human brain and learns patterns of past keyword annotation and automatically outputs keywords that users prefer. The purposes of the proposed method are automating manual keyword annotation and achieving high speed and high accuracy keyword annotation. Experimental results showed that the proposed method is more accurate than TFIDF and Naive Bayes in P@5 and P@10. Moreover, these results also showed that the proposed method could annotate about 19 times faster than Naive Bayes.

Download Full-text

Part-of-speech persistence: The influence of part-of-speech information on lexical processes☆

Journal of Memory and Language ◽

10.1016/j.jml.2006.12.001 ◽

2007 ◽

Vol 56 (4) ◽

pp. 472-489 ◽

Cited By ~ 12

Author(s):

Alissa Melinger ◽

Jean-Pierre Koenig

Keyword(s):

Part Of Speech ◽

Speech Information ◽

Lexical Processes

Download Full-text

POS-tagging a bilingual parallel corpus: methods and challenges

Research in Corpus Linguistics ◽

10.32714/ricl.05.03 ◽

2017 ◽

pp. 35-46 ◽

Cited By ~ 2

Author(s):

Irene Doval

Keyword(s):

The Other ◽

Major Error ◽

Error Patterns ◽

Parallel Corpus ◽

Pos Tagging ◽

Ongoing Process ◽

Part Of Speech ◽

Improve Accuracy ◽

The One ◽

Speech Information

This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.

Download Full-text

Machine Learning Based Statistical Analysis of Emotion Recognition using Facial Expression

RADS Journal of Biological Research & Applied Sciences ◽

10.37962/jbas.v11i1.262 ◽

2020 ◽

Vol 11 (1) ◽

pp. 1-8

Author(s):

Aqib Ali ◽

Jamal Abdul Nasir ◽

Muhammad Munawar Ahmed ◽

Samreen Naeem ◽

Sania Anam ◽

...

Keyword(s):

Machine Learning ◽

Statistical Analysis ◽

Decision Tree ◽

Facial Expression ◽

Emotion Recognition ◽

Digital Image ◽

Search Algorithm ◽

Statistical Feature ◽

Tree Algorithms ◽

Two Phases

Background: Humans can deliver many emotions during a conversation. Facial expressions show information about emotions. Objectives: This study proposed a Machine Learning (ML) approach based on a statistical analysis of emotion recognition using facial expression through a digital image. Methodology: A total of 600 digital image datasets divided into 6 classes (Anger, Happy, Fear, Surprise, Sad, and Normal) was collected from publicly available Taiwan Facial Expression Images Database. In the first step, all images are converted into a gray level format and 4 Regions of Interest (ROIs) are created on each image, so the total image dataset gets divided in 2400 (600 x 4) sub-images. In the second step, 3 types of statistical features named texture, histogram, and binary feature are extracted from each ROIs. The third step is a statistical feature optimization using the best-first search algorithm. Lastly, an optimized statistical feature dataset is deployed on various ML classifiers. Results: The analysis part was divided into two phases: firstly boosting algorithms-based ML classifiers (named as LogitBoost, AdaboostM1, and Stacking) which obtained 94.11%, 92.15%, and 89.21% accuracy, respectively. Secondly, decision tree algorithms named J48, Random Forest, and Random Committee were obtained with 97.05%, 93.14%, and 92.15% accuracy, respectively. Conclusion: It was observed that decision tree based J48 classifiers gave 97.05% classification accuracy.

Download Full-text

An Algorithm for Morphological Segmentation of Esperanto Words

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0003 ◽

2016 ◽

Vol 105 (1) ◽

pp. 63-76

Author(s):

Theresa Guinard

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Morphological Analysis ◽

Compound Words ◽

Part Of Speech ◽

Semantic Classes ◽

Morphological Segmentation ◽

Segmentation Accuracy ◽

N Gram ◽

Speech Information

Abstract Morphological analysis (finding the component morphemes of a word and tagging morphemes with part-of-speech information) is a useful preprocessing step in many natural language processing applications, especially for synthetic languages. Compound words from the constructed language Esperanto are formed by straightforward agglutination, but for many words, there is more than one possible sequence of component morphemes. However, one segmentation is usually more semantically probable than the others. This paper presents a modified n-gram Markov model that finds the most probable segmentation of any Esperanto word, where the model’s states represent morpheme part-of-speech and semantic classes. The overall segmentation accuracy was over 98% for a set of presegmented dictionary words.

Download Full-text

Fusing Part-of-Speech Information in Low-Resource Neural Paraphrase Generation

Computational Intelligence and Neuroscience ◽

10.1155/2021/9022193 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Xiaoqiang Chi ◽

Yang Xiang

Keyword(s):

Language Processing ◽

Sequence Learning ◽

Neural Nets ◽

Linguistic Knowledge ◽

Underlying Assumption ◽

Low Resource ◽

Part Of Speech ◽

Multiple Datasets ◽

Speech Information ◽

Paraphrase Generation

Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.

Download Full-text