scholarly journals Arabic Book Retrieval using Class and Book Index Based Term Weighting

Author(s):  
M. Ali Fauzi ◽  
Agus Zainal Arifin ◽  
Anny Yuniarti

One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic <em>Fiqh</em> (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a document that will be ranked based on the user query. We developed class-based indexing method called inverse class frequency (ICF) and book-based indexing method inverse book frequency (IBF) for this Arabic information retrieval. Those method then been incorporated with the previous method so that it becomes TF.IDF.ICF.IBF. The term weighting method also used for feature selection due to high dimensionality of the feature space. This novel method was tested using a dataset from 13 Arabic Fiqh e-books. The experimental results showed that the proposed method have the highest precision, recall, and F-Measure than the other three methods at variations of feature selection. The best performance of this method was obtained when using best 1000 features by precision value of 76%, recall value of 74%, and F-Measure value of 75%.

Author(s):  
Ni Made Gita Dwi Purnamasari ◽  
M. Ali Fauzi ◽  
Indriati Indriati ◽  
Liana Shinta Dewi

<span>Cyberbullying is one of the actions that violate the ITE Law where the crime is committed on social media applications such as Twitter. This action is difficult to detect if no one is reporting the tweet. Cyberbullying tweet identification aims to classify tweets that contain bullying. Classification is done using Support Vector Machine method where this method aims to find the dividing hyperplane between negative and positive class. This study is a text classification where more data is used, the more features are produced, therefore this research also uses Information Gain as feature selection to select features that are not relevant to the classification. The process of the system starts from text preprocessing with tokenizing, filtering, stemming and term weighting. Then perform the information gain feature selection by calculating the entropy value of each term. After that perform the classification process based on the terms that have been selected, and the output of the system is identification whether the tweet is bullying or not. The result of using SVM method is accuracy 75%, precision 70.27%, recall 86.66% and f-measure 77.61% on experiment maximum iteration = 20, λ = 0.5, γ = 0.001, ε = 0.000001, and C = 1. The best threshold of information gain is 90%, with accuracy 76.66%, precision 72.22%, recall 86.66% and f-measure 78.78%.</span>


2013 ◽  
Vol 17 (2) ◽  
pp. 153-176 ◽  
Author(s):  
İlker Kocabaş ◽  
Bekir Taner Dinçer ◽  
Bahar Karaoğlan

2017 ◽  
Vol 10 (1) ◽  
pp. 29
Author(s):  
Rizka Sholikah ◽  
Dhian Kartika ◽  
Agus Zainal Arifin ◽  
Diana Purwitasari

Query becomes one of the most decisive factor on documents searching. A query contains several words, where one of them will become a key term. Key term is a word that has higher information and value than the others in query. It can be used in any kind of text documents, including Arabic Fiqh documents. Using key term in term weighting process could led to an improvement on result’s relevancy. In Arabic Fiqh document searching, not using the proper method in term weighting will relieve important value of key term. In this paper, we propose a new term weighting method based on Positive Impact Factor Query (PIFQ) for Arabic Fiqh documents ranking. PIFQ calculated using key term’s frequency on each category (mazhab) on Fiqh. The key term that frequently appear on a certain mazhab will get higher score on that mazhab, and vice versa. After PIFQ values are acquired, TF.IDF calculation will be done to each words. Then, PIFQ weight will be combine with the result from TF.IDF so that the new weight values for each words will be produced. Experimental result performed on a number of queries using 143 Arabic Fiqh documents show that the proposed method is better than traditional TF.IDF, with 77.9%, 83.1%, and 80.1% of precision, recall, and F-measure respectively.


2018 ◽  
Vol 9 (1) ◽  
pp. 9-17
Author(s):  
Marcel Bonar Kristanda ◽  
Seng Hansun ◽  
Albert Albert

Library catalog is a documentation or list of all library collections. Unfortunately, there is a problem identified in the process of searching a book inside library catalog in Universitas Multimedia Nusantara’s library information system regarding the relevant result based on user query input. This research aims to design and build a library catalog application on Android platform in order to increase the relvancy of searching result in a database using calculated Rocchio Relevance Feedback method along with user experience measurement. User experience analysis result presented a good respond with 91.18% score based by all factor and relevance value present 71.43% precision, 100% recall, and 83.33% F-Measure. Differences of relevant results between the Senayan Library Information system (SLiMS) and the new Android application ranged at 36.11%. Therefore, this Android application proved to give relevant result based on relevance rank. Index Terms—Rocchio, Relevance, Feedback, Pencarian, Buku, Aplikasi, Android, Perpustakaan.


Author(s):  
Radha Guha

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.


Author(s):  
Tahar Rafa ◽  
Samir Kechid

The user-centred information retrieval needs to introduce semantics into the user modelling for a meaningful representation of user interests. The semantic representation of the user interests helps to improve the identification of the user’s future cognitive needs. In this paper, we present a semantic-based approach for a personalised information retrieval. This approach is based on the design and the exploitation of a user profile to represent the user and his interests. In this user profile, we combine an ontological semantics issued from WordNet ontology, and a personal semantics issued from the different user interactions with the search system and with his social and situational contexts of his previous searches. The personal semantics considers the co-occurrence relations between relevant components of the user profile as semantic links. The user profile is used to improve two important phases of the information search process: (i) expansion of the initial user query and (ii) adaptation of the search results to the user interests.


2020 ◽  
Vol 11 (2) ◽  
pp. 107-111
Author(s):  
Christevan Destitus ◽  
Wella Wella ◽  
Suryasari Suryasari

This study aims to clarify tweets on twitter using the Support Vector Machine and Information Gain methods. The clarification itself aims to find a hyperplane that separates the negative and positive classes. In the research stage, there is a system process, namely text mining, text processing which has stages of tokenizing, filtering, stemming, and term weighting. After that, a feature selection is made by information gain which calculates the entropy value of each word. After that, clarify based on the features that have been selected and the output is in the form of identifying whether the tweet is bully or not. The results of this study found that the Support Vector Machine and Information Gain methods have sufficiently maximum results.


Sign in / Sign up

Export Citation Format

Share Document