Arabic Book Retrieval using Class and Book Index Based Term Weighting

One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic <em>Fiqh</em> (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a document that will be ranked based on the user query. We developed class-based indexing method called inverse class frequency (ICF) and book-based indexing method inverse book frequency (IBF) for this Arabic information retrieval. Those method then been incorporated with the previous method so that it becomes TF.IDF.ICF.IBF. The term weighting method also used for feature selection due to high dimensionality of the feature space. This novel method was tested using a dataset from 13 Arabic Fiqh e-books. The experimental results showed that the proposed method have the highest precision, recall, and F-Measure than the other three methods at variations of feature selection. The best performance of this method was obtained when using best 1000 features by precision value of 76%, recall value of 74%, and F-Measure value of 75%.

Download Full-text

Information Retrieval by Modified Term Weighting Method Using Random Walk Model with Query Term Position Ranking

2009 International Conference on Signal Processing Systems ◽

10.1109/icsps.2009.122 ◽

2009 ◽

Cited By ~ 3

Author(s):

Abu Shamim Mohammad Arif ◽

Md Masudur Rahman ◽

Shamima Yeasmin Mukta

Keyword(s):

Information Retrieval ◽

Random Walk ◽

Random Walk Model ◽

Query Term ◽

Term Weighting ◽

Weighting Method

Download Full-text

Cyberbullying identification in twitter using support vector machine and information gain based feature selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i3.pp1494-1500 ◽

2020 ◽

Vol 18 (3) ◽

pp. 1494

Author(s):

Ni Made Gita Dwi Purnamasari ◽

M. Ali Fauzi ◽

Indriati Indriati ◽

Liana Shinta Dewi

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Information Gain ◽

Support Vector ◽

Term Weighting ◽

Machine Method ◽

Support Vector Machine Method ◽

Positive Class ◽

Media Applications ◽

F Measure

<span>Cyberbullying is one of the actions that violate the ITE Law where the crime is committed on social media applications such as Twitter. This action is difficult to detect if no one is reporting the tweet. Cyberbullying tweet identification aims to classify tweets that contain bullying. Classification is done using Support Vector Machine method where this method aims to find the dividing hyperplane between negative and positive class. This study is a text classification where more data is used, the more features are produced, therefore this research also uses Information Gain as feature selection to select features that are not relevant to the classification. The process of the system starts from text preprocessing with tokenizing, filtering, stemming and term weighting. Then perform the information gain feature selection by calculating the entropy value of each term. After that perform the classification process based on the terms that have been selected, and the output of the system is identification whether the tweet is bullying or not. The result of using SVM method is accuracy 75%, precision 70.27%, recall 86.66% and f-measure 77.61% on experiment maximum iteration = 20, λ = 0.5, γ = 0.001, ε = 0.000001, and C = 1. The best threshold of information gain is 90%, with accuracy 76.66%, precision 72.22%, recall 86.66% and f-measure 78.78%.</span>

Download Full-text

An effective term weighting method using random walk model for information retrieval

2008 International Conference on Computer and Communication Engineering ◽

10.1109/iccce.2008.4580827 ◽

2008 ◽

Author(s):

Md. Rafiqul Islam ◽

Buddha Dev Sarker ◽

Md. Rakibul Islam

Keyword(s):

Information Retrieval ◽

Random Walk ◽

Random Walk Model ◽

Term Weighting ◽

Weighting Method

Download Full-text

Efficient Feature Selection and Domain Relevance Term Weighting Method for Document Classification

2010 Second International Conference on Computer Engineering and Applications ◽

10.1109/iccea.2010.228 ◽

2010 ◽

Cited By ~ 2

Author(s):

Aurangzeb Khan ◽

Baharum Baharudin ◽

Khairullah Khan

Keyword(s):

Feature Selection ◽

Document Classification ◽

Term Weighting ◽

Weighting Method

Download Full-text

A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

Information Retrieval ◽

10.1007/s10791-013-9225-4 ◽

2013 ◽

Vol 17 (2) ◽

pp. 153-176 ◽

Cited By ~ 13

Author(s):

İlker Kocabaş ◽

Bekir Taner Dinçer ◽

Bahar Karaoğlan

Keyword(s):

Information Retrieval ◽

Term Weighting ◽

Weighting Method

Download Full-text

TERM WEIGHTING BASED ON POSITIVE IMPACT FACTOR QUERY FOR ARABIC FIQH DOCUMENT RANKING

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v10i1.408 ◽

2017 ◽

Vol 10 (1) ◽

pp. 29

Author(s):

Rizka Sholikah ◽

Dhian Kartika ◽

Agus Zainal Arifin ◽

Diana Purwitasari

Keyword(s):

Impact Factor ◽

Positive Impact ◽

Decisive Factor ◽

Experimental Result ◽

Term Weighting ◽

Weighting Method ◽

Text Documents ◽

Document Ranking ◽

F Measure ◽

Better Than

Query becomes one of the most decisive factor on documents searching. A query contains several words, where one of them will become a key term. Key term is a word that has higher information and value than the others in query. It can be used in any kind of text documents, including Arabic Fiqh documents. Using key term in term weighting process could led to an improvement on result’s relevancy. In Arabic Fiqh document searching, not using the proper method in term weighting will relieve important value of key term. In this paper, we propose a new term weighting method based on Positive Impact Factor Query (PIFQ) for Arabic Fiqh documents ranking. PIFQ calculated using key term’s frequency on each category (mazhab) on Fiqh. The key term that frequently appear on a certain mazhab will get higher score on that mazhab, and vice versa. After PIFQ values are acquired, TF.IDF calculation will be done to each words. Then, PIFQ weight will be combine with the result from TF.IDF so that the new weight values for each words will be produced. Experimental result performed on a number of queries using 143 Arabic Fiqh documents show that the proposed method is better than traditional TF.IDF, with 77.9%, 83.1%, and 80.1% of precision, recall, and F-measure respectively.

Download Full-text

Rancang Bangun Aplikasi UMN Library Catalog Menggunakan Metode Rocchio Relevance Feedback

Jurnal ULTIMA InfoSys ◽

10.31937/si.v9i1.684 ◽

2018 ◽

Vol 9 (1) ◽

pp. 9-17

Author(s):

Marcel Bonar Kristanda ◽

Seng Hansun ◽

Albert Albert

Keyword(s):

Information System ◽

User Experience ◽

Relevance Feedback ◽

Android Application ◽

Library Catalog ◽

Relevant Result ◽

User Query ◽

Index Terms ◽

Feedback Method ◽

F Measure

Library catalog is a documentation or list of all library collections. Unfortunately, there is a problem identified in the process of searching a book inside library catalog in Universitas Multimedia Nusantara’s library information system regarding the relevant result based on user query input. This research aims to design and build a library catalog application on Android platform in order to increase the relvancy of searching result in a database using calculated Rocchio Relevance Feedback method along with user experience measurement. User experience analysis result presented a good respond with 91.18% score based by all factor and relevance value present 71.43% precision, 100% recall, and 83.33% F-Measure. Differences of relevant results between the Senayan Library Information system (SLiMS) and the new Android application ranged at 36.11%. Therefore, this Android application proved to give relevant result based on relevance rank. Index Terms—Rocchio, Relevance, Feedback, Pencarian, Buku, Aplikasi, Android, Perpustakaan.

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Semantic Representation of a Geo-Social User Profile for a Personalised Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500441 ◽

2021 ◽

pp. 2150044

Author(s):

Tahar Rafa ◽

Samir Kechid

Keyword(s):

Information Retrieval ◽

Information Search ◽

Semantic Representation ◽

User Profile ◽

Search Process ◽

Search System ◽

User Interactions ◽

User Interests ◽

Situational Contexts ◽

User Query

The user-centred information retrieval needs to introduce semantics into the user modelling for a meaningful representation of user interests. The semantic representation of the user interests helps to improve the identification of the user’s future cognitive needs. In this paper, we present a semantic-based approach for a personalised information retrieval. This approach is based on the design and the exploitation of a user profile to represent the user and his interests. In this user profile, we combine an ontological semantics issued from WordNet ontology, and a personal semantics issued from the different user interactions with the search system and with his social and situational contexts of his previous searches. The personal semantics considers the co-occurrence relations between relevant components of the user profile as semantic links. The user profile is used to improve two important phases of the information search process: (i) expansion of the initial user query and (ii) adaptation of the search results to the user interests.

Download Full-text

Support Vector Machine VS Information Gain: Analisis Sentimen Cyberbullying di Twitter Indonesia

Jurnal ULTIMA InfoSys ◽

10.31937/si.v11i2.1740 ◽

2020 ◽

Vol 11 (2) ◽

pp. 107-111

Author(s):

Christevan Destitus ◽

Wella Wella ◽

Suryasari Suryasari

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Mining ◽

Information Gain ◽

Text Processing ◽

Support Vector ◽

Term Weighting ◽

System Process ◽

Research Stage

This study aims to clarify tweets on twitter using the Support Vector Machine and Information Gain methods. The clarification itself aims to find a hyperplane that separates the negative and positive classes. In the research stage, there is a system process, namely text mining, text processing which has stages of tokenizing, filtering, stemming, and term weighting. After that, a feature selection is made by information gain which calculates the entropy value of each word. After that, clarify based on the features that have been selected and the output is in the form of identifying whether the tweet is bully or not. The results of this study found that the Support Vector Machine and Information Gain methods have sufficiently maximum results.

Download Full-text