search logs
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 17)

H-INDEX

14
(FIVE YEARS 2)

2022 ◽  
Vol 40 (3) ◽  
pp. 1-24
Author(s):  
Jiashu Zhao ◽  
Jimmy Xiangji Huang ◽  
Hongbo Deng ◽  
Yi Chang ◽  
Long Xia

In this article, we propose a Latent Dirichlet Allocation– (LDA) based topic-graph probabilistic personalization model for Web search. This model represents a user graph in a latent topic graph and simultaneously estimates the probabilities that the user is interested in the topics, as well as the probabilities that the user is not interested in the topics. For a given query issued by the user, the webpages that have higher relevancy to the interested topics are promoted, and the webpages more relevant to the non-interesting topics are penalized. In particular, we simulate a user’s search intent by building two profiles: A positive user profile for the probabilities of the user is interested in the topics and a corresponding negative user profile for the probabilities of being not interested in the the topics. The profiles are estimated based on the user’s search logs. A clicked webpage is assumed to include interesting topics. A skipped (viewed but not clicked) webpage is assumed to cover some non-interesting topics to the user. Such estimations are performed in the latent topic space generated by LDA. Moreover, a new approach is proposed to estimate the correlation between a given query and the user’s search history so as to determine how much personalization should be considered for the query. We compare our proposed models with several strong baselines including state-of-the-art personalization approaches. Experiments conducted on a large-scale real user search log collection illustrate the effectiveness of the proposed models.


Author(s):  
Markus Fischer ◽  
Kristof Komlossy ◽  
Benno Stein ◽  
Martin Potthast ◽  
Matthias Hagen
Keyword(s):  

2021 ◽  
pp. 016555152198953
Author(s):  
Paul H Cleverley ◽  
Fionnuala Cousins ◽  
Simon Burnett

COVID-19 has created unprecedented organisational challenges, yet no study has examined the impact on information search. A case study in a knowledge-intensive organisation was undertaken on 2.5 million search queries during the pandemic. A surge of unique users and COVID-19 search queries in March 2020 may equate to ‘peak uncertainty and activity’, demonstrating the importance of corporate search engines in times of crisis. Search volumes dropped 24% after lockdowns; an ‘L-shaped’ recovery may be a surrogate for business activity. COVID-19 search queries transitioned from awareness, to impact, strategy, response and ways of working that may influence future search design. Low click through rates imply some information needs were not met and searches on mental health increased. In extreme situations (i.e. a pandemic), companies may need to move faster, monitoring and exploiting their enterprise search logs in real time as these reflect uncertainty and anxiety that may exist in the enterprise.


2021 ◽  
Vol 36 (1) ◽  
pp. WI2-C_1-10
Author(s):  
Yusei Nakata ◽  
Naoki Muramoto ◽  
Takehiro Yamamoto ◽  
Sumio Fujita ◽  
Hiroaki Ohshima

2021 ◽  
Vol 48 (3) ◽  
pp. 219-230
Author(s):  
Mingfang Wu ◽  
Ying-Hsang Liu ◽  
Rowan Brownlee ◽  
Xiuzhen Zhang

In this paper, we present a case study of how well subject metadata (comprising headings from an international classification scheme) has been deployed in a national data catalogue, and how often data seekers use subject metadata when searching for data. Through an analysis of user search behaviour as recorded in search logs, we find evidence that users utilise the subject metadata for data discovery. Since approximately half of the records ingested by the catalogue did not include subject metadata at the time of harvest, we experimented with automatic subject classification approaches in order to enrich these records and to provide additional support for user search and data discovery. Our results show that automatic methods work well for well represented categories of subject metadata, and these categories tend to have features that can distinguish themselves from the other categories. Our findings raise implications for data catalogue providers; they should invest more effort to enhance the quality of data records by providing an adequate description of these records for under-represented subject categories.


Author(s):  
Foyzul Hassan ◽  
Chetan Bansal ◽  
Nachiappan Nagappan ◽  
Thomas Zimmermann ◽  
Ahmed Hassan Awadallah
Keyword(s):  

2020 ◽  
Vol 1 ◽  
pp. 1-21
Author(s):  
Haiqi Xu ◽  
Ehsan Hamzei ◽  
Enkhbold Nyamsuren ◽  
Han Kruiger ◽  
Stephan Winter ◽  
...  

Abstract. Understanding syntactic and semantic structure of geographic questions is a necessary step towards true geographic question-answering (GeoQA) machines. The empirical basis for the understanding of the capabilities expected from GeoQA systems are geographic question corpora. Available corpora in English have been mostly drawn from generic Web search logs or limited user studies, supporting the focus of GeoQA systems on retrieving factoids: factual knowledge about particular places and everyday processes. Yet, the majority of questions enquired about in the spatial sciences go beyond simple place facts, with more complex analytical intents informing the questions. In this paper, we introduce a new corpus of geo-analytic questions drawn from English textbooks and scientific articles. We analyse and compare this corpus with two general-purpose GeoQA corpora in terms of grammatical complexity and semantic concepts, using a new parsing method that allows us to differentiate and quantify patterns of a question’s intent.


2020 ◽  
Vol 9 (1) ◽  
pp. 2046-2048

-One of the major challenges a developer may face is security issues/threats on the labelled data. The labelled data comprises of system logs, network traffic or any other enriched data with threat/not threat classification. . There were few studies which categorized the URLs to a specific category like Arts, Technology, etc. In this paper the main research is on the classification of users based on the search logs(URLs). Manually it is difficult to differentiate the user based on search logs. So, we train a machine learning model that takes raw data as input and classifies the user to genuine or malign. This model helps in intrusion detection/suspicious activity detection. For this first we gather data of past malicious URLS as training set for Naïve Bayes algorithm to detect the malicious users. By implementing KNN algorithm effectively we can detect the malign users up to an accuracy of 94.28%. With the help of Machine Learning algorithms like Naïve Bayes, KNN, Random Forest classifiers we can classify the malign and genuine users.


Sign in / Sign up

Export Citation Format

Share Document