Finding Information Faster by Tracing My Colleagues' Trails

Author(s):  
Patrick Winter ◽  
Michael Schulz ◽  
Tobias H. Engler

Knowledge workers are confronted with the challenge of efficient information retrieval in enterprises, which is one of the most important barriers to knowledge reuse. This problem has been intensified in recent years by several organizational developments such as increasing data volume and number of data sources. In this chapter, a reference algorithm for enterprise search is developed that integrates aspects from personalized, social, collaborative, and dynamic search to consider the different natures and requirements of enterprise and web search. Because of the modular structure of the algorithm, it can easily be adapted by enterprises to their specificities by concretization. The components that can be configured during the adaptation process are discussed. Furthermore, the performance of a typical instance of the algorithm is investigated through a laboratory experiment. This instance is found to outperform rather traditional approaches to enterprise search.

Author(s):  
Amit Singh ◽  
Aditi Sharan

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.


2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


2013 ◽  
Vol 2013 ◽  
pp. 1-10
Author(s):  
Lei Luo ◽  
Chao Zhang ◽  
Yongrui Qin ◽  
Chunyuan Zhang

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.


2018 ◽  
Vol 10 (11) ◽  
pp. 112
Author(s):  
Jialu Xu ◽  
Feiyue Ye

With the explosion of web information, search engines have become main tools in information retrieval. However, most queries submitted in web search are ambiguous and multifaceted. Understanding the queries and mining query intention is critical for search engines. In this paper, we present a novel query recommendation algorithm by combining query information and URL information which can get wide and accurate query relevance. The calculation of query relevance is based on query information by query co-concurrence and query embedding vector. Adding the ranking to query-URL pairs can calculate the strength between query and URL more precisely. Empirical experiments are performed based on AOL log. The results demonstrate the effectiveness of our proposed query recommendation algorithm, which achieves superior performance compared to other algorithms.


Author(s):  
Partha Pradip Adhikari ◽  
Satya Bhusan Paul

 Objective: Indian Traditional Medicine, the foundation of age-old practice of medicine in the world, has played an essential role in human health care service and welfare from its inception. Likewise, all traditional medicines are of its own regional effects and dominant in the West Asian nations; India, Pakistan, Tibet, and so forth, East Asian nations; China, Korea, Japan, Vietnam, and so forth, Africa, South and Central America. This article is an attempt to illuminate Indian traditional medical service and its importance, based on recent methodical reviews.Methods: Web search engines for example; Google, Science Direct and Google Scholar were employed for reviews as well as for meta-analysis.Results: There is a long running debate between individuals, who utilize Indian Traditional Medicines for different ailments and disorders, and the individuals who depend on the present day; modern medicine for cure. The civil argument between modern medicine and traditional medicines comes down to a basic truth; each person, regardless of education or sickness, ought to be educated about the actualities concerning their illness and the associated side effects of medicines. Therapeutic knowledge of Indian traditional medicine has propelled various traditional approaches with similar or different theories and methodologies, which are of regional significance.Conclusion: To extend research exercises on Indian Traditional Medicine, in near future, and to explore the phytochemicals; the current review will help the investigators involved in traditional medicinal pursuit.


2013 ◽  
Vol 12 (3) ◽  
pp. 287-305 ◽  
Author(s):  
Mingming Zhou

Traditional approaches of researching self-regulated learning (SRL) fail to capture how learners actually employ studying tactics, how tactics are strategically adapted to specific learning contexts, and how learners adapt tactics and interweave them to form an efficient strategy. Computer traces can capture SRL “on the fly,” and enable researchers to track learning events in a nonlinear environment without disrupting the learner’s thinking or navigation through content. More importantly, data obtained in real time allow “virtual” re-creation of learners’ actions during studying. There were 107 Chinese university students’ traces collected while they solved assigned problems through searching the web. By linking their regulatory activities during online search to their goal profiles, results showed that mastery-approach-dominant students were most strategic, whereas performance-avoidance-dominant students were least. Moderately motivated students showed a mixed pattern of deep and surface study strategies. Implications of the findings were also discussed.


2006 ◽  
pp. 63-1-63-16
Author(s):  
Amy N. Langville ◽  
Carl D. Meyer

Author(s):  
Qiaozhu Mei ◽  
Dragomir Radev

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.


Author(s):  
Teresa Numerico

We can find the first anticipation of the World Wide Web hypertextual structure in Bush paper of 1945, where he described a “selection” and storage machine called the Memex, capable of keeping the useful information of a user and connecting it to other relevant material present in the machine or added by other users. We will argue that Vannevar Bush, who conceived this type of machine, did it because its involvement with analogical devices. During the 1930s, in fact, he invented and built the Differential Analyzer, a powerful analogue machine, used to calculate various relevant mathematical functions. The model of the Memex is not the digital one, because it relies on another form of data representation that emulates more the procedures of memory than the attitude of the logic used by the intellect. Memory seems to select and arrange information according to association strategies, i.e., using analogies and connections that are very often arbitrary, sometimes even chaotic and completely subjective. The organization of information and the knowledge creation process suggested by logic and symbolic formal representation of data is deeply different from the former one, though the logic approach is at the core of the birth of computer science (i.e., the Turing Machine and the Von Neumann Machine). We will discuss the issues raised by these two “visions” of information management and the influences of the philosophical tradition of the theory of knowledge on the hypertextual organization of content. We will also analyze all the consequences of these different attitudes with respect to information retrieval techniques in a hypertextual environment, as the web. Our position is that it necessary to take into accounts the nature and the dynamic social topology of the network when we choose information retrieval methods for the network; otherwise, we risk creating a misleading service for the end user of web search tools (i.e., search engines).


Sign in / Sign up

Export Citation Format

Share Document