scholarly journals Relevance Feedback using Genetic Algorithm on Information Retrieval for Indonesian Language Documents

Author(s):  
Salman Dziyaul Azmi ◽  
Retno Kusumaningrum

Background: The Rapid growth of technological developments in Indonesia had resulted in a growing amount of information. Therefore, a new information retrieval environment is necessary for finding documents that are in accordance with the user’s information needs.Objective: The purpose of this study is to uncover the differences between using Relevance Feedback (RF) with genetic algorithm and standard information retrieval systems without relevance feedback for the Indonesian language documents.Methods: The standard Information Retrieval (IR) System uses Sastrawi stemmer and Vector Space Model, while Genetic Algorithm-based (GA-based) relevance feedback uses Roulette-wheel selection and crossover recombination. The evaluation metrics are Mean Average Precision (MAP) and average recall based on user judgments.Results: By using two Indonesian language document datasets, namely abstract thesis and news dataset, the results show 15.2% and 28.6% increase in the corresponding MAP values for both datasets as opposed to the standard Information Retrieval System. A respective 7.1% and 10.5% improvement on the recall value at 10th position was also observed for both datasets. The best obtained genetic algorithm parameters for abstract thesis datasets were a population size of 20 with 0.7 crossover probability and 0.2 mutation probability, while for news dataset, the best obtained genetic algorithm parameters were a population size of 10 with 0.5 crossover probability and 0.2 mutation probability.Conclusion: Genetic Algorithm-based relevance feedback increases both values of MAP and average recall at 10th position of retrieved document. Generally, the best genetic algorithm parameters are as follows, mutation probability is 0.2, whereas the size of population size and crossover probability depends on the size of dataset and length of the query.Keywords: Genetic Algorithm, Information Retrieval, Indonesian language document, Mean Average Precision, Relevance Feedback 

2017 ◽  
Vol 865 ◽  
pp. 492-495
Author(s):  
Rong Rong Song

In order to improve the strong nonlinearity and uncertainty of the suspension system, the suspension system was transformed into two different linear subsystems by the Taylor’s formula and the proportional-integral-differential controller based on genetic algorithm was designed in this article. Optimizing the code, the population size, the crossover probability, the mutation probability and the maximum number of iteration, we obtained respectively the optimized parameters of the controllers of the electromagnet 1 and the electromagnet 2. The simulation results showed that the optimized suspension system had a good robustness.


Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2017 ◽  
Vol 49 (3) ◽  
pp. 903-926 ◽  
Author(s):  
Raphaël Cerf

Abstract We introduce a new parameter to discuss the behavior of a genetic algorithm. This parameter is the mean number of exact copies of the best-fit chromosomes from one generation to the next. We believe that the genetic algorithm operates best when this parameter is slightly larger than 1 and we prove two results supporting this belief. We consider the case of the simple genetic algorithm with the roulette wheel selection mechanism. We denote by ℓ the length of the chromosomes, m the population size, pC the crossover probability, and pM the mutation probability. Our results suggest that the mutation and crossover probabilities should be tuned so that, at each generation, the maximal fitness multiplied by (1 - pC)(1 - pM)ℓ is greater than the mean fitness.


Author(s):  
Rita Rismala ◽  
Mahmud Dwi Sulistiyo

[Id]Sistem rekomendasi yang dibangun dalam penelitian ini adalah sistem rekomendasi yang dapat memberikan rekomendasi sebuah item terbaik kepada user. Dari sisi data mining, pembangunan sistem rekomendasi satu item ini dapat dipandang sebagai upaya untuk membangun sebuah model classifier yang dapat digunakan untuk mengelompokkan data ke dalam satu kelas tertentu. Model classifier yang digunakan bersifat linier. Untuk menghasilkan konfigurasi model classifier yang optimal digunakan Algoritma Genetika (AG). Performansi AG dalam melakukan optimasi pada model klasifikasi linier yang digunakan cukup dapat diterima. Untuk dataset yang digunakan dengan kombinasi nilai parameter terbaik yaitu yaitu ukuran populasi 50, probabilitas crossover 0.7, dan probabilitas mutasi 0.1, diperoleh rata-rata akurasi sebesar 72.80% dengan rata-rata waktu proses 6.04 detik, sehingga penerapan teknik klasifikasi menggunakan AG dapat menjadi solusi alternatif dalam membangun sebuah sistem rekomendasi, namun dengan tetap memperhatikan pengaturan nilai parameter yang sesuai dengan permasalahan yang dihadapi.Kata kunci:sistem rekomendasi, klasifikasi, Algoritma Genetika[En]In this study was developed a recommendation system that can recommend top-one item to a user. In terms of data mining, it can be seen as a problem to develop a classifier model that can be used to classify data into one particular class. The model used was a linear classifier. To produce the optimal configuration of classifier model was used Genetic Algorithm (GA). GA performance in optimizing the linear classification model was acceptable. Using the case study dataset and combination of the best parameter value, namely population size 50, crossover probability 0.7 and mutation probability 0.1, obtained average accuracy 72.80% and average processing time of 6.04 seconds, so that the implementation of classification techniques using GA can be an alternative solution in developing a recommender system, due regard to setting the parameter value depend on the encountered problem.Keywords:Recommendation system, classification, Genetic Algorithm


2019 ◽  
Vol 27 (2) ◽  
pp. 194-201 ◽  
Author(s):  
Dina Demner-Fushman ◽  
Yassine Mrabet ◽  
Asma Ben Abacha

Abstract Objective Consumers increasingly turn to the internet in search of health-related information; and they want their questions answered with short and precise passages, rather than needing to analyze lists of relevant documents returned by search engines and reading each document to find an answer. We aim to answer consumer health questions with information from reliable sources. Materials and Methods We combine knowledge-based, traditional machine and deep learning approaches to understand consumers’ questions and select the best answers from consumer-oriented sources. We evaluate the end-to-end system and its components on simple questions generated in a pilot development of MedlinePlus Alexa skill, as well as the short and long real-life questions submitted to the National Library of Medicine by consumers. Results Our system achieves 78.7% mean average precision and 87.9% mean reciprocal rank on simple Alexa questions, and 44.5% mean average precision and 51.6% mean reciprocal rank on real-life questions submitted by National Library of Medicine consumers. Discussion The ensemble of deep learning, domain knowledge, and traditional approaches recognizes question type and focus well in the simple questions, but it leaves room for improvement on the real-life consumers’ questions. Information retrieval approaches alone are sufficient for finding answers to simple Alexa questions. Answering real-life questions, however, benefits from a combination of information retrieval and inference approaches. Conclusion A pilot practical implementation of research needed to help consumers find reliable answers to their health-related questions demonstrates that for most questions the reliable answers exist and can be found automatically with acceptable accuracy.


2011 ◽  
pp. 140-160
Author(s):  
Sheng-Uei Guan ◽  
Chang Ching Chng ◽  
Fangming Zhu

This chapter proposes the establishment of OntoQuery in an m-commerce agent framework. OntoQuery represents a new query formation approach that combines the usage of ontology and keywords. This approach takes advantage of the tree pathway structure in ontology to form queries visually and efficiently. Also, it uses keywords to complete the query formation process more efficiently. Present query optimization techniques like relevance feedback use expensive iterations. The proposed information retrieval scheme focuses on using genetic algorithms to improve computational effectiveness. Mutations are done on queries formed in the earlier part by replacing terms with synonyms. Query optimization techniques used include query restructuring by logical terms and numerical constraints replacement. Also, the fitness function of the genetic algorithm is defined by three elements, number of documents retrieved, quality of documents, and correlation of queries. The number and quality of documents retrieved give the basic strength of a mutated query.


2014 ◽  
Vol 10 (1) ◽  
pp. 189
Author(s):  
Zulfahmi Indra ◽  
Subanar Subanar

AbstrakManajemen rantai pasok merupakan hal yang penting. Inti utama dari manajemen rantai pasok adalah proses distribusi. Salah satu permasalahan distribusi adalah strategi keputusan dalam menentukan pengalokasian banyaknya produk yang harus dipindahkan mulai dari tingkat manufaktur hingga ke tingkat pelanggan. Penelitian ini melakukan optimasi rantai pasok tiga tingkat mulai dari manufaktur-distributor-gosir-retail. Adapun pendekatan yang dilakukan adalah algoritma genetika adaptif dan terdistribusi. Solusi berupa alokasi banyaknya produk yang dikirim pada setiap tingkat akan dimodelkan sebagai sebuah kromosom. Parameter genetika seperti jumlah kromosom dalam populasi, probabilitas crossover dan probabilitas mutasi akan secara adaptif berubah sesuai dengan kondisi populasi pada generasi tersebut. Dalam penelitian ini digunakan 3 sub populasi yang bisa melakukan pertukaran individu setiap saat sesuai dengan probabilitas migrasi. Adapun hasil penelitian yang dilakukan 30 kali untuk setiap perpaduan nilai parameter genetika menunjukkan bahwa nilai biaya terendah yang didapatkan adalah 80,910, yang terjadi pada probabilitas crossover 0.4, probabilitas mutasi 0.1, probabilitas migrasi 0.1 dan migration rate 0.1. Hasil yang diperoleh lebih baik daripada metode stepping stone yang mendapatkan biaya sebesar 89,825. Kata kunci— manajemen rantai pasok, rantai pasok tiga tingkat, algortima genetika adaptif, algoritma genetika terdistribusi. Abstract Supply chain management is critical in business area. The main core of supply chain management is the process of distribution. One issue is the distribution of decision strategies in determining the allocation of the number of products that must be moved from the level of the manufacture to the customer level. This study take optimization of three levels distribution from manufacture-distributor-wholeshale-retailer. The approach taken is adaptive and distributed genetic algorithm. Solution in the form of allocation of the number of products delivered at each level will be modeled as a chromosome. Genetic parameters such as the number of chromosomes in the population, crossover probability and adaptive mutation probability will change adaptively according to conditions on the population of that generation. This study used 3 sub-populations that exchange individuals at any time in accordance with the probability of migration. The results of research conducted 30 times for each value of the parameter genetic fusion showed that the lowest cost value obtained is 80,910, which occurs at the crossover probability 0.4, mutation probability 0.1, the probability of migration 0.1 and migration rate 0.1. This result has shown that adaptive and distributed genetic algorithm is better than stepping stone method that obtained 89,825. Keywords— management supply chain, three level supply chain, adaptive genetic algorithm, distributed genetic algorithm.


2018 ◽  
Vol 173 ◽  
pp. 03051
Author(s):  
Huizhou Yang ◽  
Li Zhang

How to select and combine many services with similar functions reasonably and efficiently to provide users with better service is the main challenge in the service composition problem. This is thorny when the number of the candidate Services is huge. Recently, researches transform the service compositions problem as a multi-objective optimizing task, and then the genetic algorithm is commonly used to tackle this issue. However, the fixed crossover probability and mutation probability settings in genetic algorithm usually result to it falls into a local optimal. To improve the performance of the genetic algorithm in the service composition task, this paper proposes an adaptive parameter adjust strategy, which can adjust the crossover probability and mutation probability automatically. The experiment result shows our method has greatly improved the maximum fitness of the final solutions of traditional genetic algorithm.


2020 ◽  
Vol 39 (4) ◽  
pp. 5407-5416
Author(s):  
Murugan Sivaram ◽  
K. Batri ◽  
Amin Salih Mohammed ◽  
V. Porkodi ◽  
N.V. Kousik

This article explores the odd and even point crossover based Tabu Genetic Algorithm. The search optimization tools equipped with exploration and exploitation operators. Those operators assist the optimization tools for finding the optimal solution. Few problems demand vigorous exploration and minimal exploitation. The vigorous exploration needs some specialized operators, which is capable of carrying out the task. In this article, we explore one such possible operator using odd and even point (OEP) crossover. The resultant hybrid GA namely OEP crossover based Tabu GA has two tuning factors namely tenure period and OEP crossover probability (Podd). The tenure period may be a single entity or a group of entities. However, Podd is single, as the tenure period is involved with group of entities, it demands some fine tuning. The fine tuning may alter the proportion of exploration and exploitation. Hence, we proposed a method for selecting the tenure period. The proposed exploration operator and the method for fixing the tenure period has been tested over the data fusion problem in information retrieval. The results look promising.


2014 ◽  
Vol 556-562 ◽  
pp. 4617-4621
Author(s):  
Fu Xing Chen ◽  
Xu Sheng Xie

The query cost usually as an important criterion for a distributed database. The genetic algorithm is an adaptive probabilistic search algorithm, but the crossover and mutation probability usually keep a probability in traditional genetic algorithm. If the crossover probability keep a large value, the possibility of damage for genetic algorithm model is greater; In turn, if the crossover probability keep a small value, the search process will transform a slow processing or even stagnating. If the mutation probability keep a small value, a new individual can be reproduced difficultly; In turn, if the mutation probability keep a large value, the genetic algorithm will as a Pure random search algorithm. To solve this problem, proposed a improved genetic algorithm that multiple possibility of crossover and mutation based on k-means clustering algorithm. The experiment results indicate that the algorithm is effective.


Sign in / Sign up

Export Citation Format

Share Document