scholarly journals An Effective Preprocessing Algorithm for Information Retrieval System

2019 ◽  
Vol 8 (3) ◽  
pp. 6371-6375

The innovation of web produced a huge of information, evaluates by empowering Internet users to post their assessments, remarks, and audits on the web. Preprocessing helps to understand a user query in the Information Retrieval (IR) system. IR acts as the container to representation, seeking and access information that relates to a user search string. The information is present in natural language by using some words; it’s not structured format, and sometimes that word often ambiguous. One of the major challenges determines in current web search vocabulary mismatch problem during the preprocessing. In an IR system determine a drawback in web search; the search query string is that the relationships between the query expressions and the expanded terms are limited. The query expressions relate to search term fetching information from the IR. The expanded terms by adding those terms that is most similar to the words of the search string. In this manuscript, we mainly focus on behind user’s search string on the web. We identify the best features within this context for term selection in supervised learning based model. In this proposed system the main focus of preprocessing techniques like Tokenization, Stemming, spell check, find dissimilar words and discover the keywords from the user query because provide better results for the user

Author(s):  
Ricardo Baeza-Yates ◽  
Roi Blanco ◽  
Malú Castellanos

Web search has become a ubiquitous commodity for Internet users. This fact puts a large number of documents with plenty of text content at our fingertips. To make good use of this data, we need to mine web text. This triggers the two problems covered here: sentiment analysis and entity retrieval in the context of the Web. The first problem answers the question of what people think about a given product or a topic, in particular sentiment analysis in social media. The second problem addresses the issue of solving certain enquiries precisely by returning a particular object: for instance, where the next concert of my favourite band will be or who the best cooks are in a particular region. Where to find these objects and how to retrieve, rank, and display them are tasks related to the entity retrieval problem.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


Author(s):  
Esharenana E. Adomi

The World Wide Web (WWW) has led to the advent of the information age. With increased demand for information from various quarters, the Web has turned out to be a veritable resource. Web surfers in the early days were frustrated by the delay in finding the information they needed. The first major leap for information retrieval came from the deployment of Web search engines such as Lycos, Excite, AltaVista, etc. The rapid growth in the popularity of the Web during the past few years has led to a precipitous pronouncement of death for the online services that preceded the Web in the wired world.


Author(s):  
Ji-Rong Wen

The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both. The language of the Web, HTML (Hypertext Markup Language), is mainly designed for information display, not for semantic representation. Therefore, current Web search engines usually treat Web pages as unstructured documents, and traditional information retrieval (IR) technologies are employed for Web page parsing, indexing, and searching. The unstructured essence of Web pages seriously blocks more accurate search and advanced applications on the Web. For example, many sites contain structured information about various products. Extracting and integrating product information from multiple Web sites could lead to powerful search functions, such as comparison shopping and business intelligence. However, these structured data are embedded in Web pages, and there are no proper traditional methods to extract and integrate them. Another example is the link structure of the Web. If used properly, information hidden in the links could be taken advantage of to effectively improve search performance and make Web search go beyond traditional information retrieval (Page, Brin, Motwani, & Winograd, 1998, Kleinberg, 1998).


Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


2019 ◽  
Vol 31 (5) ◽  
pp. 480-491
Author(s):  
Carole Rodon ◽  
Anne Congard

Abstract Searching for information on the web is regarded as a complex problem-solving activity involving a range of cognitive and affective processes. Anxiety is a key affective factor. In this article, we describe the construction and initial validation stages of the Information Retrieval on the Web Anxiety Rate (IROWAR) scale. The final structure of this inventory was validated with a sample of 183 English-speaking Internet users. Reliability analyses indicated that the factors were internally consistent (Cronbach’s alpha: 0.92). When we checked divergent validity, we found negative correlations with both self-efficacy and positive attitude towards the Internet. There were no effects of either sex or age on the total IROWAR score, but the Internet search anxiety sum score decreased with the length of use. This scale will be useful in several domains, including research on the determinants of web anxiety, individuals’ experience of web anxiety and ways of supporting them and Internet learning.


2019 ◽  
Vol 9 (6) ◽  
pp. 1181-1190 ◽  
Author(s):  
Mohib Ullah ◽  
Muhammad Arshad Islam ◽  
Rafiullah Khan ◽  
Muhammad Aleem ◽  
Muhammad Azhar Iqbal

Users around the world send queries to the Web Search Engine (WSE) to retrieve data from the Internet. Users usually take primary assistance relating to medical information from WSE via search queries. The search queries relating to diseases and treatment is contemplated to be the most personal facts about the user. The search queries often contain identifiable information that can be linked back to the originator, which can compromise the privacy of a user. In this work, we are proposing a distributed privacy-preserving protocol (OSLo) that eliminates limitation in the existing distributed privacy-preserving protocols and a framework, which evaluates the privacy of a user. The OSLo framework asses the local privacy relative to the group of users involved in forwarding query to the WSE and the profile privacy against the profiling of WSE. The privacy analysis shows that the local privacy of a user directly depends on the size of the group and inversely on the number of compromised users. We have performed experiments to evaluate the profile privacy of a user using a privacy metric Profile Exposure Level. The OSLo is simulated with a subset of 1000 users of the AOL query log. The results show that OSLo performs better than the benchmark privacy-preserving protocol on the basis of privacy and delay. Additionally, results depict that the privacy of a user depends on the size of the group.


Author(s):  
Sunny Sharma ◽  
Sunita Sunita ◽  
Arjun Kumar ◽  
Vijay Rana

<span lang="EN-US">The emergence of the Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, comments, and reviews on the web. To extract useful information from this raw data can be a very challenging task. Search engines play a critical role in these circumstances. User queries are becoming main issues for the search engines. Therefore a preprocessing operation is essential. In this paper, we present a framework for natural language preprocessing for efficient data retrieval and some of the required processing for effective retrieval such as elongated word handling, stop word removal, stemming, etc. This manuscript starts by building a manually annotated dataset and then takes the reader through the detailed steps of process. Experiments are conducted for special stages of this process to examine the accuracy of the system.</span>


Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


Author(s):  
Cédric Pruski ◽  
Nicolas Guelfi ◽  
Chantal Reynaud

Finding relevant information on the Web is difficult for most users. Although Web search applications are improving, they must be more “intelligent” to adapt to the search domains targeted by queries, the evolution of these domains, and users’ characteristics. In this paper, the authors present the TARGET framework for Web Information Retrieval. The proposed approach relies on the use of ontologies of a particular nature, called adaptive ontologies, for representing both the search domain and a user’s profile. Unlike existing approaches on ontologies, the authors make adaptive ontologies adapt semi-automatically to the evolution of the modeled domain. The ontologies and their properties are exploited for domain specific Web search purposes. The authors propose graph-based data structures for enriching Web data in semantics, as well as define an automatic query expansion technique to adapt a query to users’ real needs. The enriched query is evaluated on the previously defined graph-based data structures representing a set of Web pages returned by a usual search engine in order to extract the most relevant information according to user needs. The overall TARGET framework is formalized using first-order logic and fully tool supported.


Sign in / Sign up

Export Citation Format

Share Document