Measuring the Extent of the Synonym Problem in Full-Text Searching

Objective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term. Methods – We considered a sample of 90 single word synonym pairs and searched for each word in the pair, both singly and jointly, in the Yahoo! database. We determined the number of web sites that were missed when only one but not the other term was included in the search field. Results – Depending upon how common the usage is of the synonym, the percentage of missed web sites can vary from almost 0% to almost 100%. When the search uses a very uncommon synonym ("diaconate"), a very high percentage of web pages can be missed (95%), versus the search using the more common term (only 9% are missed when searching web pages for the term "deacons"). If both terms in a word pair were nearly equal in usage ("cooks" and "chefs"), then a search on one term but not the other missed almost half the relevant web pages. Conclusion – Our results indicate great value for search engines to incorporate automatic synonym searching not only for user-specified terms but also for high usage synonyms. Moreover, the results demonstrate the value of information retrieval systems that use controlled vocabularies and cross references to generate search results.

Download Full-text

Concept-matching IR systems versus word-matching information retrieval systems: Considering fuzzy interrelations for indexing Web pages

Journal of the American Society for Information Science and Technology ◽

10.1002/asi.20310 ◽

2006 ◽

Vol 57 (4) ◽

pp. 564-576 ◽

Cited By ~ 21

Author(s):

Pablo J. Garcés ◽

José A. Olivas ◽

Francisco P. Romero

Keyword(s):

Information Retrieval ◽

Web Pages ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Improving keyword extraction in multilingual texts

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i6.pp5909-5916 ◽

2020 ◽

Vol 10 (6) ◽

pp. 5909

Author(s):

Bahare Hashemzahde ◽

Majid Abdolrazzagh-Nezhad

Keyword(s):

Extraction Procedure ◽

The Other ◽

Keyword Extraction ◽

Inverse Document Frequency ◽

Retrieval Systems ◽

Document Frequency ◽

Extraction Algorithm ◽

Information Retrieval Systems ◽

Multilingual Text ◽

Available Information

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.

Download Full-text

A User-Centered Approach for Information Retrieval

Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications - Advances in Intelligent Information Technologies ◽

10.4018/978-1-60566-144-5.ch009 ◽

2011 ◽

pp. 165-178 ◽

Cited By ~ 1

Author(s):

Antonio Picariello

Keyword(s):

Information Retrieval ◽

Relevance Feedback ◽

Statistical Information ◽

Web Pages ◽

General Knowledge ◽

Test Set ◽

Retrieval Technique ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Relevant Component

Information retrieval can take great advantages and improvements considering users’ feedbacks. Therefore, the user dimension is a relevant component that must be taken into account while planning and implementing real information retrieval systems. In this chapter, we first describe several concepts related to relevance feedback methods, and then propose a novel information retrieval technique which uses the relevance feedback concepts in order to improve accuracy in an ontology-based system. In particular, we combine the Semantic information from a general knowledge base with statistical information using relevance feedback. Several experiments and results are presented using a test set constituted of Web pages.

Download Full-text

User Relevance Feedback in Semantic Information Retrieval

Emerging Topics and Technologies in Information Systems ◽

10.4018/978-1-60566-222-0.ch016 ◽

2009 ◽

pp. 270-281

Author(s):

Antonio Picariello ◽

Antonio M. Rinaldi

Keyword(s):

Information Retrieval ◽

Relevance Feedback ◽

Semantic Information ◽

Information Retrieval System ◽

Web Pages ◽

General Knowledge ◽

Retrieval Systems ◽

Crucial Component ◽

Information Retrieval Systems ◽

Semantic Information Retrieval

The user dimension is a crucial component in the information retrieval process and for this reason it must be taken into account in planning and technique implementation in information retrieval systems. In this paper we present a technique based on relevance feedback to improve the accuracy in an ontology based information retrieval system. Our proposed method combines the semantic information in a general knowledge base with statistical information using relevance feedback. Several experiments and results are presented using a test set constituted of Web pages.

Download Full-text

Decoupling Information Retrieval Systems from Internet QoS in SMPs

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.602-605.3706 ◽

2014 ◽

Vol 602-605 ◽

pp. 3706-3711

Author(s):

Hao Chen ◽

Qin Qun Chen ◽

Shao Xia YE

Keyword(s):

Information Retrieval ◽

The Other ◽

Wide Area ◽

Wide Area Networks ◽

Retrieval Systems ◽

Other Hand ◽

Information Retrieval Systems

In recent years, much research has been devoted to the analysis of 128 bit architectures; on the other hand, few have evaluated the construction of wide-area networks. In fact, few cyberneticists would disagree with the understanding of IPv6. This is an important point to understand. we describe an autonomous tool for developing compilers, which we call ADZ.

Download Full-text

The Impact of Ontology on the Performance of Information Retrieval

Web Engineering Advancements and Trends ◽

10.4018/978-1-60566-719-5.ch002 ◽

2010 ◽

pp. 24-37

Author(s):

Indrawan Maria ◽

Loke Seng

Keyword(s):

Information Technology ◽

Information Retrieval ◽

The Other ◽

Ideal Solution ◽

New Approach ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

The Right ◽

The Impact ◽

Semantic Problem

The debate on the effectiveness of ontology in solving semantic problems has increased recently in many domains of information technology. One side of the debate accepts the inclusion of ontology as a suitable solution. The other side of the debate argues that ontology is far from an ideal solution to the semantic problem. This article explores this debate in the area of information retrieval. Several past approaches were explored and a new approach was investigated to test the effectiveness of a generic ontology such as WordNet in improving the performance of information retrieval systems. The test and the analysis of the experiments suggest that WordNet is far from the ideal solution in solving semantic problems in the information retrieval. However, several observations have been made and reported in this article that allow research in ontology for the information retrieval to move towards the right direction.

Download Full-text

Compilation and evaluation of the Spanish SatiCorpus 2021 for satire identification using linguistic features and transformers

Complex & Intelligent Systems ◽

10.1007/s40747-021-00625-1 ◽

2022 ◽

Author(s):

José Antonio García-Díaz ◽

Rafael Valencia-García

Keyword(s):

Social Media ◽

Information Retrieval ◽

The Other ◽

Linguistic Features ◽

Feature Sets ◽

Retrieval Systems ◽

Other Hand ◽

Extensive Evaluation ◽

Information Retrieval Systems ◽

The One

AbstractSatirical content on social media is hard to distinguish from real news, misinformation, hoaxes or propaganda when there are no clues as to which medium these news were originally written in. It is important, therefore, to provide Information Retrieval systems with mechanisms to identify which results are legitimate and which ones are misleading. Our contribution for satire identification is twofold. On the one hand, we release the Spanish SatiCorpus 2021, a balanced dataset that contains satirical and non-satirical documents. On the other hand, we conduct an extensive evaluation of this dataset with linguistic features and embedding-based features. All feature sets are evaluated separately and combined using different strategies. Our best result is achieved with a combination of the linguistic features and BERT with an accuracy of 97.405%. Besides, we compare our proposal with existing datasets in Spanish regarding satire and irony.

Download Full-text

Comparative Analysis of Information Retrieval using Ontology Based vs Traditional Information Systems in Food Science Domain

DESIDOC Journal of Library & Information Technology ◽

10.14429/djlit.40.02.15213 ◽

2020 ◽

Vol 40 (02) ◽

pp. 437-444

Author(s):

Padmavathi T

Keyword(s):

Information Retrieval ◽

Information Systems ◽

Retrieval System ◽

Food Science ◽

Web Pages ◽

Average Precision ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Science Domain ◽

Server Application

The current methods of searching and information retrieval are imprecise, often yielding results in tens of thousands of web pages. Extraction of the actual information needed often requires extensive manual browsing of retrieved documents. In order to address these drawbacks, this paper introduces an implementation in the field of food science of the ontology-based information retrieval system, and comparison is made with conventional information systems. The ontology of Food Semantic Web Knowledge Base (FSWKB) was built using the Protégé framework which supports two main models of ontology through the editors Protégé-Frames and Protégé-OWL. The FSWKB is composed of two heterogeneous ontologies, and these are merged and processed on a separate server application making use of the Apache Jena Fuseki an SPARQL server offering SPARQL endpoint. The experimental results indicated that ontology-based information systems are more effective in terms of their retrieval capability compared to the more conventional information retrieval systems. The retrieval effectiveness was measured in terms of precision and recall. The results of the work showed that traditional search results in average precision and recall levels of 0.92 and 0.18. The ontology-based test for precision and recall has average rates of 0.96 and 0.97.

Download Full-text

Algorithms of relationships and dependencies search in Web-pages

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.01.044 ◽

2016 ◽

pp. 044-050 ◽

Cited By ~ 1

Author(s):

A.M. Glybovets ◽

◽

Keyword(s):

Information Retrieval ◽

Computer Science ◽

Search Engine ◽

Data Storage ◽

Data Selection ◽

Web Pages ◽

Retrieval Systems ◽

Data Store ◽

Information Retrieval Systems ◽

Existing Data

Methods of extraction and analysis of data – a relatively new and promising branch of computer science, has found its application in information retrieval systems. An algorithm of relationships and dependencies searching in the collections of Web pages. The algorithm does not provide relevant search resources. This function is performed by the search engine. It also produces cleaning, integration, and data selection. A special feature of the algorithm is to use the existing data store (search engine or data storage), language independence and ease of implementation.

Download Full-text

Incremental Refinement of Page Ranking of Web Pages

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2020070104 ◽

2020 ◽

Vol 10 (3) ◽

pp. 57-73

Author(s):

Prem Sagar Sharma ◽

Divakar Yadav

Keyword(s):

Information Retrieval ◽

Web Pages ◽

Web Based ◽

Query Log ◽

Large Size ◽

Retrieval Systems ◽

Page Ranking ◽

User Query ◽

Information Retrieval Systems ◽

Ranking Mechanism

Web-based information retrieval systems called search engines have made things easy for information seekers, but still do not provide guarantees about the relevance of the information provided to the users. Information retrieval systems provide the information to the user based on certain retrieval criteria. Due to the large size of the WWW, it is very common that a large number of documents get identified related to a particular domain. Therefore, to help users towards finding the best matching documents, a ranking mechanism is employed by the search engine. In this article, an improved architecture for an information retrieval system is proposed. The proposed system makes a query log for each user query and stores the results retrieved to the user for that query. The system also provides relevant results by analyzing the content of the pages retrieved for the user query.

Download Full-text