CONCEPT-BASED TERM WEIGHTING FOR WEB INFORMATION RETRIEVAL

In this paper we present a novel technique for determining term importance by exploiting concept-based information found in ontologies. Calculating term importance is a significant and fundamental aspect of most information retrieval approaches, and it is traditionally determined through inverse document frequency (IDF). We propose concept-based term weighting (CBW), a technique that is fundamentally different to IDF in that it calculates term importance by intuitively interpreting the conceptual information in ontologies. We show that when CBW is used in an approach for web information retrieval on benchmark data, it performs comparatively to IDF, with only a 3.5% degradation in retrieval accuracy. While this small degradation has been observed, the significance of this technique is that (1) unlike IDF, CBW is independent of document collection statistics, (2) it presents a new way of interpreting ontologies for retrieval, and (3) it introduces an additional source of term importance information that can be used for term weighting.

Download Full-text

Concept-Based Term Weighting for Web Information Retrieval

Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'05) ◽

10.1109/iccima.2005.20 ◽

2006 ◽

Author(s):

J. Zakos ◽

B. Verma

Keyword(s):

Information Retrieval ◽

Web Information Retrieval ◽

Term Weighting ◽

Web Information

Download Full-text

A Bio-inspired Modified PSO Strategy for Effective Web Information Retrieval using RCV1 Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8903.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 779-785

Keyword(s):

Information Retrieval ◽

Fitness Function ◽

System Efficiency ◽

Web Information Retrieval ◽

Key Technology ◽

Query Response Time ◽

Web Information ◽

Document Collection ◽

Tremendous Amount ◽

Modified Pso

Information retrieval is a key technology in accessing the vast amount of data present on today’s World Wide Web. Numerous challenges arise at various stages of information retrieval from the web, such as missing of plenteous relevant documents, static user queries, ever changing and tremendous amount of document collection and so forth. Therefore, more powerful strategies are required to search for relevant documents. In this paper, a PSO methodology is proposed which is hybridized with Simulated Annealing with the aim of optimizing Web Information Retrieval (WIR) process. Hybridized PSO has a high impact on reducing the query response time of the system and hence subsidizes the system efficiency. A novel similarity measure called SMDR acts as a fitness function in the hybridized PSO-SA algorithm. Evaluations measures such as accuracy, MRR, MAP, DCG, IDCG, F-measure and specificity are used to measure the effectiveness of the proposed system and to compare it with existing system as well. Ultimately, experiments are extensively carried out on a huge RCV1 collections. Achieved precision-recall rates demonstrate the considerably improved effectiveness of the proposed system than that of existing one.

Download Full-text

Contextual Proximity Based Term-Weighting for Improved Web Information Retrieval

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-540-76719-0_28 ◽

2007 ◽

pp. 267-278 ◽

Cited By ~ 7

Author(s):

M. P. S. Bhatia ◽

Akshi Kumar Khalid

Keyword(s):

Information Retrieval ◽

Web Information Retrieval ◽

Term Weighting ◽

Web Information

Download Full-text

Comparing DBpedia, Wikidata, and YAGO for Web Information Retrieval

Intelligent and Interactive Computing - Lecture Notes in Networks and Systems ◽

10.1007/978-981-13-6031-2_40 ◽

2019 ◽

pp. 525-535 ◽

Cited By ~ 2

Author(s):

Sini Govinda Pillai ◽

Lay-Ki Soon ◽

Su-Cheng Haw

Keyword(s):

Information Retrieval ◽

Web Information Retrieval ◽

Web Information

Download Full-text

4th International Workshop on Web Information Retrieval Support Systems (WIRSS 2011)

2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2011.308 ◽

2011 ◽

Keyword(s):

Information Retrieval ◽

Support Systems ◽

International Workshop ◽

Web Information Retrieval ◽

Web Information

Download Full-text

Sistem Rekomendasi Produk Pena Eksklusif Menggunakan Metode Content-Based Filtering dan TF-IDF

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v5i3.1563 ◽

2020 ◽

Vol 5 (3) ◽

pp. 229

Author(s):

Mariani Widia Putri ◽

Achmad Muchayan ◽

Made Kamisutara

Keyword(s):

Information Retrieval ◽

Customer Relationship Management ◽

Relationship Management ◽

Customer Relationship ◽

Brand Awareness ◽

Product Knowledge ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Content Based Filtering

Sistem rekomendasi saat ini sedang menjadi tren. Kebiasaan masyarakat yang saat ini lebih mengandalkan transaksi secara online dengan berbagai alasan pribadi. Sistem rekomendasi menawarkan cara yang lebih mudah dan cepat sehingga pengguna tidak perlu meluangkan waktu terlalu banyak untuk menemukan barang yang diinginkan. Persaingan antar pelaku bisnis pun berubah sehingga harus mengubah pendekatan agar bisa menjangkau calon pelanggan. Oleh karena itu dibutuhkan sebuah sistem yang dapat menunjang hal tersebut. Maka dalam penelitian ini, penulis membangun sistem rekomendasi produk menggunakan metode Content-Based Filtering dan Term Frequency Inverse Document Frequency (TF-IDF) dari model Information Retrieval (IR). Untuk memperoleh hasil yang efisien dan sesuai dengan kebutuhan solusi dalam meningkatkan Customer Relationship Management (CRM). Sistem rekomendasi dibangun dan diterapkan sebagai solusi agar dapat meningkatkan brand awareness pelanggan dan meminimalisir terjadinya gagal transaksi di karenakan kurang nya informasi yang dapat disampaikan secara langsung atau offline. Data yang digunakan terdiri dari 258 kode produk produk yang yang masing-masing memiliki delapan kategori dan 33 kata kunci pembentuk sesuai dengan product knowledge perusahaan. Hasil perhitungan TF-IDF menunjukkan nilai bobot 13,854 saat menampilkan rekomendasi produk terbaik pertama, dan memiliki keakuratan sebesar 96,5% dalam memberikan rekomendasi pena.

Download Full-text