Exploring Topic-language Preferences in Multilingual Swahili Information Retrieval in Tanzania

Author(s):  
Joseph P. Telemala ◽  
Hussein Suleman

Habitual switching of languages is a common behaviour among polyglots when searching for information on the Web. Studies in information retrieval (IR) and multilingual information retrieval (MLIR) suggest that part of the reason for such regular switching of languages is the topic of search. Unlike survey-based studies, this study uses query and click-through logs. It exploits the querying and results selection behaviour of Swahili MLIR system users to explore how topic of search (query) is associated with language preferences—topic-language preferences. This article is based on a carefully controlled study using Swahili-speaking Web users in Tanzania who interacted with a guided multilingual search engine. From the statistical analysis of queries and click-through logs, it was revealed that language preferences may be associated with the topics of search. The results also suggest that language preferences are not static; they vary along the course of Web search from query to results selection. In most of the topics, users either had significantly no language preference or preferred to query in Kiswahili and changed their preference to either English or no preference for language when selecting/clicking on the results. The findings of this study might provide researchers with more insights in developing better MLIR systems that support certain types of users and in certain scenarios.

2021 ◽  
pp. 1-11
Author(s):  
Zhinan Gou ◽  
Yan Li

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


Author(s):  
Esharenana E. Adomi

The World Wide Web (WWW) has led to the advent of the information age. With increased demand for information from various quarters, the Web has turned out to be a veritable resource. Web surfers in the early days were frustrated by the delay in finding the information they needed. The first major leap for information retrieval came from the deployment of Web search engines such as Lycos, Excite, AltaVista, etc. The rapid growth in the popularity of the Web during the past few years has led to a precipitous pronouncement of death for the online services that preceded the Web in the wired world.


Author(s):  
Ji-Rong Wen

The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both. The language of the Web, HTML (Hypertext Markup Language), is mainly designed for information display, not for semantic representation. Therefore, current Web search engines usually treat Web pages as unstructured documents, and traditional information retrieval (IR) technologies are employed for Web page parsing, indexing, and searching. The unstructured essence of Web pages seriously blocks more accurate search and advanced applications on the Web. For example, many sites contain structured information about various products. Extracting and integrating product information from multiple Web sites could lead to powerful search functions, such as comparison shopping and business intelligence. However, these structured data are embedded in Web pages, and there are no proper traditional methods to extract and integrate them. Another example is the link structure of the Web. If used properly, information hidden in the links could be taken advantage of to effectively improve search performance and make Web search go beyond traditional information retrieval (Page, Brin, Motwani, & Winograd, 1998, Kleinberg, 1998).


Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


Author(s):  
Cédric Pruski ◽  
Nicolas Guelfi ◽  
Chantal Reynaud

Finding relevant information on the Web is difficult for most users. Although Web search applications are improving, they must be more “intelligent” to adapt to the search domains targeted by queries, the evolution of these domains, and users’ characteristics. In this paper, the authors present the TARGET framework for Web Information Retrieval. The proposed approach relies on the use of ontologies of a particular nature, called adaptive ontologies, for representing both the search domain and a user’s profile. Unlike existing approaches on ontologies, the authors make adaptive ontologies adapt semi-automatically to the evolution of the modeled domain. The ontologies and their properties are exploited for domain specific Web search purposes. The authors propose graph-based data structures for enriching Web data in semantics, as well as define an automatic query expansion technique to adapt a query to users’ real needs. The enriched query is evaluated on the previously defined graph-based data structures representing a set of Web pages returned by a usual search engine in order to extract the most relevant information according to user needs. The overall TARGET framework is formalized using first-order logic and fully tool supported.


2012 ◽  
pp. 217-238 ◽  
Author(s):  
Orland Hoeber

People commonly experience difficulties when searching the Web, arising from an incomplete knowledge regarding their information needs, an inability to formulate accurate queries, and a low tolerance for considering the relevance of the search results. While simple and easy to use interfaces have made Web search universally accessible, they provide little assistance for people to overcome the difficulties they experience when their information needs are more complex than simple fact-verification. In human-centred Web search, the purpose of the search engine expands from a simple information retrieval engine to a decision support system. People are empowered to take an active role in the search process, with the search engine supporting them in developing a deeper understanding of their information needs, assisting them in crafting and refining their queries, and aiding them in evaluating and exploring the search results. In this chapter, recent research in this domain is outlined and discussed.


2018 ◽  
Vol 15 (2) ◽  
pp. 595-600
Author(s):  
R. Sathish Kumar ◽  
M. Chandrasekaran

Web query classification, the task of inferring topical categories from a web search query is a non-trivial problem in Information Retrieval domain. The topic categories inferred by a Web query classification system may provide a rich set of features for improving query expansion and web advertising. Conventional methods for Web query classification derive corpus statistics from the web and employ machine-learning techniques to infer Open Directory Project categories. But they suffer from two major drawbacks, the computational overhead to derive corpus statistics and inferring topic categories that are too abstract for semantic discrimination due to polysemy. Concepts too shallow or too deep in the semantic gradient are produced due to the wrong senses of the query terms coalescing with the correct senses. This paper proposes and demonstrates a succinct solution to these problems through a method based on the Tree cut model and Wordnet Thesarus to infer fine-grained topic categories for Web query classification, and also suggests an enhancement to the Tree Cut Model to resolve sense ambiguities.


2017 ◽  
Vol 26 (06) ◽  
pp. 1730002 ◽  
Author(s):  
T. Dhiliphan Rajkumar ◽  
S. P. Raja ◽  
A. Suruliandi

Short and ambiguous queries are the major problems in search engines which lead to irrelevant information retrieval for the users’ input. The increasing nature of the information on the web also makes various difficulties for the search engine to provide the users needed results. The web search engine experience the ill effects of ambiguity, since the queries are looked at on a rational level rather than the semantic level. In this paper, for improving the performance of search engine as of the users’ interest, personalization is based on the users’ clicks and bookmarking is proposed. Modified agglomerative clustering is used in this work for clustering the results. The experimental results prove that the proposed work scores better precision, recall and F-score.


Sign in / Sign up

Export Citation Format

Share Document