scholarly journals PUBLISHING SEARCH LOGS PRIVACY GUARANTEE FOR USER SENSITIVE INFORMATION

Author(s):  
J.ARUNA SANTHI ◽  
CH.LAKSHMI KUMARI ◽  
NANDITHA NANDITHA ◽  
B. MEENAKSHI

Search Engine companies maintain the search log to store the histories of their users search queries. These search logs are gold mines for researchers. However, Search engine companies take care of publishing search log in order to provide privacy for user’s sensitive information. In this paper we analyze algorithm for publishing frequent keywords, Queries, and Clicks of a search log. Before Zealous algorithm, we discuss how different variants of anonymity failed to provide good utility (publishing frequent items) and strong privacy for the search logs. And also this paper includes how zealous algorithm provides good utility and strong privacy for publishing search logs.

Author(s):  
S. Belinsha ◽  
A.P.V. Raghavendra

Search engines are being widely used by the web users. The search engine companies are concerned to produce best search results. Search logs are the records which records the interactions between the user and the search engine. Various search patterns, user's behaviors can be analyzed from these logs, which will help to enhance the search results. Publishing these search logs to third party for analysis is a privacy issue. Zealous algorithm of filtering the frequent search items in the search log looses its utility in the course of providing privacy. The proposed confess algorithm extends the work by qualifying the infrequent search items in the log which tends to increase the utility of the search log by preserving the privacy. Confess algorithm involves qualifying the infrequent keywords, URL clicks in the search log and publishing it along with the frequent items.


Author(s):  
Adan Ortiz-Cordova ◽  
Bernard J. Jansen

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.


Author(s):  
S. A. Vlasova

The article describes the automated system for creating and maintaining a database of scientific works of academic institution’s employees, developed by specialists of the Joint Supercomputer Center RAS. The system’s information base contains data about objects: the authors, related organizations (places of their work), publications at the analytical and monographic levels, sources (publications at the summary level — journals, collections), reports made at scientific conferences, symposia, seminars. The system has an administrative module designed to enter and edit data. The user’s module of the system is a special search engine that searches for information about publications, sources, reports, events, authors by processing search queries. A distinctive feature of the system is the introduced concept of «equivalent» objects. Such objects are «persons» corresponding to the same author with different spellings of the last name in the bibliographic descriptions of publications; organizations with different versions of names; articles which are published without changes in different languages.


Author(s):  
Michael Chau ◽  
Yan Lu ◽  
Xiao Fang ◽  
Christopher C. Yang

More non-English contents are now available on the World Wide Web and the number of non-English users on the Web is increasing. While it is important to understand the Web searching behavior of these non-English users, many previous studies on Web query logs have focused on analyzing English search logs and their results may not be directly applied to other languages. In this Chapter we discuss some methods and techniques that can be used to analyze search queries in Chinese. We also show an example of applying our methods on a Chinese Web search engine. Some interesting findings are reported.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5134 ◽  
Author(s):  
Feng Liang ◽  
Peng Guan ◽  
Wei Wu ◽  
Desheng Huang

Background Influenza epidemics pose significant social and economic challenges in China. Internet search query data have been identified as a valuable source for the detection of emerging influenza epidemics. However, the selection of the search queries and the adoption of prediction methods are crucial challenges when it comes to improving predictions. The purpose of this study was to explore the application of the Support Vector Machine (SVM) regression model in merging search engine query data and traditional influenza data. Methods The official monthly reported number of influenza cases in Liaoning province in China was acquired from the China National Scientific Data Center for Public Health from January 2011 to December 2015. Based on Baidu Index, a publicly available search engine database, search queries potentially related to influenza over the corresponding period were identified. An SVM regression model was built to be used for predictions, and the choice of three parameters (C, γ, ε) in the SVM regression model was determined by leave-one-out cross-validation (LOOCV) during the model construction process. The model’s performance was evaluated by the evaluation metrics including Root Mean Square Error, Root Mean Square Percentage Error and Mean Absolute Percentage Error. Results In total, 17 search queries related to influenza were generated through the initial query selection approach and were adopted to construct the SVM regression model, including nine queries in the same month, three queries at a lag of one month, one query at a lag of two months and four queries at a lag of three months. The SVM model performed well when with the parameters (C = 2, γ = 0.005, ɛ = 0.0001), based on the ensemble data integrating the influenza surveillance data and Baidu search query data. Conclusions The results demonstrated the feasibility of using internet search engine query data as the complementary data source for influenza surveillance and the efficiency of SVM regression model in tracking the influenza epidemics in Liaoning.


Author(s):  
Sebastian Wenning ◽  

The present article examines success factors by using and implementing google ads in enterprises. In order to assess the importance of google ads for marketing success, the first step is to classify the importance of google as a search platform. Measured in terms of page views, google was the clear market leader in the search engine market with a market share of 87.66 percent, ahead of Bing and Yahoo. 5.8 billion search queries per day - two trillion search queries per year - also generate opportunities for companies to present themselves and win customers. The results of this research suggest that the keyword for successful online marketing in SEA focuses on relevance. Only if content is created that, in addition to the actual promotion of a product or service, leads to further and for the user target-oriented information, the campaign experiences a quality upgrade, which not only affects the ranking and quality factors, but also the conversion behavior of customers. Based on the execution of a literature review, which has been carried out with emphasis on empirical studies and essays since 2010, four main success factors, such as presentation and content of the website, accordingly keyword marketing, the complementary use of analytics, and ad extension have been evolved.


We recommend that you compile the duplicate lists in the top search engine results to track the aspects of the query and implement a method known as QDMiner. More specifically, QDMiner extracts free text lists, HTML tags and reregions the top search engine results, combining them with groups according to the products they contain, then line up the blocks and products, depending on how the conversation and products are included in the best results. The recommended approach is generic and does not depend on understanding any area. The main purpose of the extraction side differs from the query recommendations. We recommend a structured solution, described as QDMiner, to trace query aspects immediately by removing and grouping repetitive lists in free text results and HTML tags and repeating search engines. We continue to evaluate the support of the list and discover better search queries by looking for exact similarities between menus and penalizing duplicate lists. Experimental results reveal that there are many listings available and QDMiner can find useful queries. The proposed approach is general and does not depend on understanding a particular area. As a result, it can handle opendomain queries. The query supports. Instead of a static system for your problems, we extract the sides of the uploaded document above each query


Tradterm ◽  
2021 ◽  
Vol 37 (2) ◽  
pp. 460-487
Author(s):  
Adauri Brezolin

Although it might appear contradictory to investigate noncanonical phraseological combinations in corpora, corpus linguistics research has revealed that they exceed canonical forms in number (Philip 2008). This paper intends to discuss the idea of fixedness by analyzing variant forms of idioms, and if they qualify as wordplay. The Web, our data source, is employed for collecting such noncanonical occurrences in both English and Portuguese using keywords on the Google Search Engine. Our discussion mainly draws on studies relating to fixed phrases (Kjellmer 1991; Granger & Paquot 2008; Tagnin 2013); phraseological skeletons (Renouf & Sinclair 1991; Philip 2008), and idiom transformations (Veisbergs 1997; Barta 2005). Due attention is also given to search queries of nonstandard forms of fixed expressions in corpora (Philip 2008), and the translation of idiom-based wordplay (Veisbergs 1997; Brezolin 2020)


2011 ◽  
Vol 40 ◽  
pp. 677-700 ◽  
Author(s):  
F. Wu ◽  
J. Madhavan ◽  
A. Halevy

Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be “semantically” related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives – related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.


Author(s):  
S.A. Vlasova ◽  

The article describes the automated system for creating and maintaining a database of scientific works of academic institution’s employees, developed by specialists of the Joint Supercomputer Center RAS. The system’s information base contains data about the authors, related organizations (places of their work), publications at the analytical and monographic levels, sources (publications at the summary level - journals, collections), reports made at scientific conferences, symposia, seminars. The system has an administrative module designed to enter and edit data. The user’s module of the system is a special search engine that searches for information about publications, sources, reports, events, authors by processing search queries.


Sign in / Sign up

Export Citation Format

Share Document