A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines

Author(s):  
Shaozhi Ye ◽  
Ruihua Song ◽  
Ji-Rong Wen ◽  
Wei-Ying Ma
2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


Author(s):  
Denis Shestakov

Finding information on the Web using a web search engine is one of the primary activities of today’s web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, currentday searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the nonindexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale.


Water ◽  
2019 ◽  
Vol 11 (6) ◽  
pp. 1154
Author(s):  
Chao-Chih Lin ◽  
Hund-Der Yeh

This research introduces an inverse transient-based optimization approach to automatically detect potential faults, such as leaks, partial blockages, and distributed deteriorations, within pipelines or a water distribution network (WDN). The optimization approach is named the Pipeline Examination Ordinal Symbiotic Organism Search (PEOS). A modified steady hydraulic model considering the effects of pipe aging within a system is used to determine the steady nodal heads and piping flow rates. After applying a transient excitation, the transient behaviors in the system are analyzed using the method of characteristics (MOC). A preliminary screening mechanism is adopted to sift the initial organisms (solutions) to perform better to reduce most of the unnecessary calculations caused by incorrect solutions within the PEOS framework. Further, a symbiotic organism search (SOS) imitates symbiotic relationship strategies to move organisms toward the current optimal organism and eliminate the worst ones. Two experiments on leak and blockage detection in a single pipeline that have been presented in the literature were used to verify the applicability of the proposed approach. Two hypothetical WDNs, including a small-scale and large-scale system, were considered to validate the efficiency, accuracy, and robustness of the proposed approach. The simulation results indicated that the proposed approach obtained more reliable and efficient optimal results than other algorithms did. We believe the proposed fault detection approach is a promising technique in detecting faults in field applications.


2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Fang Lyu ◽  
Yaping Lin ◽  
Junfeng Yang

The huge benefit of mobile application industry has attracted a large number of developers and attendant attackers. Application repackaging provides help for the distribution of most Android malware. It is a serious threat to the entire Android ecosystem, as it not only compromises the security and privacy of the app users but also plunders app developers’ income. Although massive approaches have been proposed to address this issue, plagiarists try to fight back through packing their malicious code with the help of commercial packers. Previous works either do not consider the packing issue or rely on time-consuming computations, which are not scalable for large-scale real-world scenario. In this paper, we propose FUIDroid, a novel two-phase app clones detection system that can detect the packed cloned app. FUIDroid includes a function-based fast selection phase to quickly select suspicious apps by analyzing apps’ description and a further UI-based accurate detection phase to refine the detection result. We evaluate our system on two sets of apps. The result from experiment on 320 packed samples demonstrates that FUIDroid is resilient to packed apps. The evaluation on more than 150,000 real-world apps shows the efficiency of FUIDroid in large-scale scenario.


PROTEOMICS ◽  
2010 ◽  
Vol 10 (6) ◽  
pp. 1172-1189 ◽  
Author(s):  
Wen Yu ◽  
J. Alex Taylor ◽  
Michael T. Davis ◽  
Leo E. Bonilla ◽  
Kimberly A. Lee ◽  
...  

2013 ◽  
Vol 40 (6) ◽  
pp. 2287-2296 ◽  
Author(s):  
Damaris Fuentes-Lorenzo ◽  
Norberto Fernández ◽  
Jesús A. Fisteus ◽  
Luis Sánchez

Sign in / Sign up

Export Citation Format

Share Document