scholarly journals Applying Similarity Measures to Improve Query Expansion

2021 ◽  
pp. 2053-2063
Author(s):  
Wajih A. Ghani A. Hussain

The huge evolving in the information technologies, especially in the few last decades, has produced an increase in the volume of data on the World Wide Web, which is still growing significantly. Retrieving the relevant information on the Internet or any data source with a query created by a few words has become a big challenge. To override this, query expansion (QE) has an important function in improving the information retrieval (IR), where the original query of user is recreated to a new query by appending new related terms with the same importance. One of the problems of query expansion is the choosing of suitable terms. This problem leads to another challenge of how to retrieve the important documents with high precision, high recall, and high F measure. In this paper, we solve this problem through applying different similarity measures with the use of English WordNet. The obtained results proved that, with a suitable selection method, we are able to take advantage of English WordNet to improve the retrieval efficiency. The work proposed in this paper is extracting the terms from all the documents and query, then applying the following steps: preprocessing, expanding the query based on English WordNet, selecting the best terms, weighting of term, and finally using the cosine similarity and Jaccard similarity to obtain the relevant documents. Our practical results were applied on the DUC2002 dataset that contains 559 documents distributed over several categories. The average precision of cosine (for random queries) = 100% whereas the average precision of Jaccard = 84.4 %, and the average recall of cosine = 86.8%   whereas the average recall of Jaccard = 73.4%. The average f-measure of cosine = 92%, whereas the average f-measure of Jaccard = 76%.

MATICS ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 10
Author(s):  
Muhammad Fairuz Zumar Rounaqi

<p class="Text"><em><span style="font-size: 9.0pt; line-height: 105%;">Abstract</span></em><span style="font-size: 9.0pt; line-height: 105%;">— Hadith are all the words, deeds and provisions of the Prophet Muhammad SAW that are used as the second of Islamic law after Al-Quran. The purpose of this study is to make an Information Retrieval system called the Query Answering System is expected to facilitate users in searching and finding the hadith documents as the user's needs. This study implements the Naïve Bayes Classifier method combined with Indonesian thesaurus as a query expansion to find the hadith documents that relevant to the input query. Based on the testing of 50 query data, the test results show that the use of query expansion gives better results than without using query expansion. Where based on testing of the top 1 data without using query expansion obtained an average recall value of 62%, an average precision value of 62%, an average accuracy value of 92.4% and an average value of the f-measure of 62%, while testing using query expansion obtained an average recall value of 66%, an average precision value of 66%, an average accuracy value of 93.2% and an average f -measure value of 66%. Based on the test results, the use of query expansion shows an improvement in the average recall value of 4%, an improvement in the average precision value of 4%, and an improvement in the average accuracy value of 0.8% and an improvement in the average f-measure value of 4% compared on without using query expansion.</span></p><p class="MsoNormal"> </p><p class="IndexTerms"><em>Index Terms</em>—hadith, information retrieval, query expansion, naïve bayes. </p>


2019 ◽  
Vol 5 (1) ◽  
pp. 47 ◽  
Author(s):  
Evan Tanuwijaya ◽  
Safri Adam ◽  
Mohammad Fatoni Anggris ◽  
Agus Zainal Arifin

Kata kunci merupakan hal terpenting dalam mencari sebuah informasi. Penggunaan kata kunci yang tepat menghasilkan informasi yang relevan. Saat penggunaannya sebagai query, pengguna menggunakan bahasa yang alami, sehingga terdapat kata di luar dokumen jawaban yang telah disiapkan oleh sistem. Sistem tidak dapat memproses bahasa alami secara langsung yang dimasukkan oleh pengguna, sehingga diperlukan proses untuk mengolah kata-kata tersebut dengan mengekspansi setiap kata yang dimasukkan pengguna yang dikenal dengan Query Expansion (QE). Metode QE pada penelitian ini menggunakan Word Embedding karena hasil dari Word Embedding dapat memberikan kata-kata yang sering muncul bersama dengan kata-kata dalam query. Hasil dari word embedding dipakai sebagai masukan pada pseudo relevance feedback untuk diperkaya berdasarkan dokumen jawaban yang telah ada. Metode QE diterapkan dan diuji coba pada aplikasi chatbot. Hasil dari uji coba metode QE yang diterapkan pada chatbot didapatkan nilai recall, precision, dan F-measure masing-masing 100%; 70% dan 82,35 %. Hasil tersebut meningkat 1,49% daripada chatbot tanpa menggunakan QE yang pernah dilakukan sebelumnya yang hanya meraih akurasi sebesar 68,51%. Berdasarkan hasil pengukuran tersebut, QE menggunakan word embedding dan pseudo relevance feedback pada chatbot dapat mengatasi query masukan dari pengguna yang ambigu dan alami, sehingga dapat memberikan jawaban yang relevan kepada pengguna.  Keywords are the most important words and phrases used to obtain relevant information on content. Although users make use of natural languages, keywords are processed as queries by the system due to its inability to process. The language directly entered by the user is known as query expansion (QE). The proposed QE in this research uses word embedding owing to its ability to provide words that often appear along with those in the query. The results are used as inputs to the pseudo relevance feedback to be enriched based on the existing documents. This method is also applied to the chatbot application and precision, and F-measure values of the results obtained were 100%, 70%, 82.35% respectively. The results are 1.49% better than chatbot without using QE with 68.51% accuracy. Based on the results of these measurements, QE using word embedding and pseudo which gave relevance feedback in chatbots can resolve ambiguous and natural user’s input queries thereby enabling the system retrieve relevant answers.


2012 ◽  
Vol 5s1 ◽  
pp. BII.S8958 ◽  
Author(s):  
Kirk Roberts ◽  
Sanda M. Harabagiu

In this paper we report on the approaches that we developed for the 2011 i2b2 Shared Task on Sentiment Analysis of Suicide Notes. We have cast the problem of detecting emotions in suicide notes as a supervised multi-label classification problem. Our classifiers use a variety of features based on (a) lexical indicators, (b) topic scores, and (c) similarity measures. Our best submission has a precision of 0.551, a recall of 0.485, and a F-measure of 0.516.


2021 ◽  
Vol 13 (1) ◽  
pp. 1-25
Author(s):  
Michael Loster ◽  
Ioannis Koumarelas ◽  
Felix Naumann

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.


Asy-Syari ah ◽  
2020 ◽  
Vol 22 (1) ◽  
pp. 147-158
Author(s):  
Deden Effendi

Abstract: Waqf law can be categorized as a living law and potential of waqf can be written. The law of the living does not fully comply with regulations. This problem can lead to law-enforcement-representation issues then formulated into the question: How to advocate community against the Waqf Law? It contains public knowledge, public awareness and public obedience. Assuming sharia is natural law, it is eternal and does not change. In the case, sharia is not in accordance with waqf. The provisions of waqf law are obtained through ushul fiqh with analogical deductive reasoning patterns. The rest, the provisions regarding waqf agreement are obtained from human preferences about the general good. Waqf law is based ruh al-hukm, the spirit of teachings, and maqashid al-shariah. It is more important to be developed to be more responsive to people's priorities and needs. Opportunities for enforcement of waqf law are very large, so that at that time the community complied with waqf law. This research is a descriptive study, which analyzes waqf as a system, as well as a subsystem of a wider system. This analysis, explains the process of society from knowledge to aware and finally to be obedient. The data source used consists of library materials both in the form of documents, books, and scientific writings and other relevant information. Data collection is carried out with literature study techniques, with the approval of the principle of relevance and novelty of the information collected. The analysis is content analysis (classification, interpretation and inference findings). Abstrak: Hukum wakaf dapat dikategorikan sebagai the living law. Sekalipun demikian, terdapat usaha-usaha untuk mengaktualisasikan potensi wakaf. Hal ini mengisyaratkan, bahwa hukum-yang-hidup tidak selalu tegak secara teoritis. Sehubungan dengan itu, masalah ini dapat diidentifikasi sebagai masalah penegakan-hukum-perwakafan. Maka masalah ini dirumuskan ke dalam pertanyaan: Bagaimana kepatuhan hukum masyarakat terhadap Undang-undang Wakaf? Penelitian ini difokus­kan pada unsur-unsur mengenai pengetahuan masyarakat (legal knowledge), kesadaran masyarakat (legal awareness), dan kepatuhan masyarakat (legal obidience) terhadap UU Wakaf. Dengan asumsi syariah merupakan hukum kodrat (natural law), sehingga sifatnya kekal dan tidak berubah.  Sekalipun demikian, dalam kasus wakaf, syariah tidak menentukan secara tegas menge­nai wakaf. Ketentuan hukum wakaf diperoleh melalui ushul fiqh, dengan pola pena­laran deduktif analogis. Selebihnya, ketentuan mengenai mekanisme wakaf diper­oleh berdasar­kan preferensi manusia mengenai kebaikan umum (public good). Hukum wakaf lebih didasarkan pada ruh al-hukm, semangat ajaran, dan maqashid al-syariah. Sehingga wakaf lebih memungkinkan untuk dikembangkan menjadi lebih responsif terhadap tuntu­tan dan kebutuhan masyarakat. Peluang penegakan hukum wakaf sangat besar, sehingga pada gilirannya masyarakat patuh terhadap hukum wakaf. Penelitian ini merupa­kan penelitian deskriptif, yakni menganalisis wakaf sebagai sebuah sistem, seka­ligus subsistem dari sistem yang lebih luas. Analisis tersebut, dideskripsikan proses masya­rakat dari tahu menjadi sadar dan akhirnya menjadi patuh terhadap hukum wakaf. Sum­ber data yang digunakan berupa bahan kepustakaan, baik berupa dokumen, buku, dan tulisan-tulisan ilmiah serta informasi lain yang relevan. Pengumpulan data dilakukan dengan teknik studi kepustakaan, dengan menekankan prinsip relevansi dan kebaruan dari informasi yang dihimpun. Adapun analisisnya adalah analisis isi (content-analysis), dengan langkah: klasifikasi data, interpretasi data, serta inferensi temuan. 


Author(s):  
Made Leo Radhitya ◽  
Agus Harjoko

One of the dangers that occur at the beach is rip current. Rip current poses significant danger for beachgoers. This paper proposes a method to predict the rip current's occurence risk by using decision tree generated using C4.5 algorithm. The output from the decision tree is rip current's occurrence risk. The case study for this research is the beach located at Rote Island, Rote Ndao, Nusa Tenggara Timur. Evaluation result shows that the accuracy is 0.84, and the precision is 0.61. The average recall value is 0.68 and the average F-measure is 0.59 in the range 0 to 1.


2021 ◽  
pp. 172-181
Author(s):  
Oksana Y. Vasileva ◽  
Marina V. Nikulina Nikulina ◽  
Juri I. Platov Platov

The article deals with the problem of selecting efficient ships by the feasibility study in which brake power, main dimensions, payload, speed and fuel consumption are determined. The necessity of using the proposed selection at the initial stage of the ship's design is justified; the problems that arise at the present time are denoted. The purpose of the article is to propose a criterion for the selection of efficient vessels, "tied" to the operating conditions, based on the marginal cost of the ship. A method for its determination is presented. At the same time, annual revenues and operating costs should be determined by modern methods of business planning for the operation of the fleet. When searching for the parameters of the ship, the optimal fuel consumption is determined. The rest of the costs can be found according to the coefficients "tied" to the fuel consumption and calculated on the basis of existing prototypes. The results of calculations by the proposed method are shown; its merits and opportunities for improvement are noted with the availability of relevant information. The conclusion is made about the convenience and applicability of the proposed option for selecting efficient ship for the feasibility study based on optimization methods for determining the parameters of vessels under conditions of a high level of use of information technologies.


Author(s):  
Flavius Frasincar ◽  
Wouter IJntema ◽  
Frank Goossen ◽  
Frederik Hogenboom

News items play an increasingly important role in the current business decision processes. Due to the large amount of news published every day it is difficult to find the new items of one’s interest. One solution to this problem is based on employing recommender systems. Traditionally, these recommenders use term extraction methods like TF-IDF combined with the cosine similarity measure. In this chapter, we explore semantic approaches for recommending news items by employing several semantic similarity measures. We have used existing semantic similarities as well as proposed new solutions for computing semantic similarities. Both traditional and semantic recommender approaches, some new, have been implemented in Athena, an extension of the Hermes news personalization framework. Based on the performed evaluation, we conclude that semantic recommender systems in general outperform traditional recommenders systems with respect to accuracy, precision, and recall, and that the new semantic recommenders have a better F-measure than existing semantic recommenders.


2015 ◽  
Vol 731 ◽  
pp. 231-236
Author(s):  
Wu Xia Ning ◽  
Qiang Wang ◽  
Jin Kai Li ◽  
Feng Wang

Keyword-based online book retrieval can not fully understand the user's query intent. Query expansion is a typical solution, but the rate of recall and precision is still very low in existing methods. In response to these problems, this paper presents a semantic query expansion method based on domain ontology and local co-occurrence probability model. First, ontology reasoning and concepts related calculation are used to obtain the initial expansion terms. Furthermore, the local co-occurrence probability model is used to filter the candidate expansion terms and the filtering function is used for secondary selection. Experiment results show that this method can effectively improve retrieval efficiency.


Sign in / Sign up

Export Citation Format

Share Document