A Hybrid Approach for Web Change Detection

Author(s):  
Sakher Khalil Alqaaidi

Search engines save copies of crawled web pages to provide instant search results, saved pages may become old and un-updated as original pages change providing new information and new links, and most of websites don’t submit these new changes to search engines so search engines don’t depend mainly on websites techniques of submitting changes. Keeping pages fresh and updated in search engine is important for giving real page ranks and for providing real time information. Techniques were invented to improve the page update process by search engine. In this paper the author combines two of good known techniques and implements the new one via experiments that improve better results in different experiment cases.

2011 ◽  
Vol 3 (4) ◽  
pp. 62-70 ◽  
Author(s):  
Stephen O’Neill ◽  
Kevin Curran

Search engine optimization (SEO) is the process of improving the visibility, volume and quality of traffic to website or a web page in search engines via the natural search results. SEO can also target other areas of a search, including image search and local search. SEO is one of many different strategies used for marketing a website but SEO has been proven the most effective. An Internet marketing campaign may drive organic search results to websites or web pages but can be involved with paid advertising on search engines. All search engines have a unique way of ranking the importance of a website. Some search engines focus on the content while others review Meta tags to identify who and what a web site’s business is. Most engines use a combination of Meta tags, content, link popularity, click popularity and longevity to determine a sites ranking. To make it even more complicated, they change their ranking policies frequently. This paper provides an overview of search engine optimisation strategies and pitfalls.


Author(s):  
Stephen O’Neill ◽  
Kevin Curran

Search engine optimization (SEO) is the process of improving the visibility, volume and quality of traffic to website or a web page in search engines via the natural search results. SEO can also target other areas of a search, including image search and local search. SEO is one of many different strategies used for marketing a website but SEO has been proven the most effective. An Internet marketing campaign may drive organic search results to websites or web pages but can be involved with paid advertising on search engines. All search engines have a unique way of ranking the importance of a website. Some search engines focus on the content while others review Meta tags to identify who and what a web site’s business is. Most engines use a combination of Meta tags, content, link popularity, click popularity and longevity to determine a sites ranking. To make it even more complicated, they change their ranking policies frequently. This paper provides an overview of search engine optimisation strategies and pitfalls.


2013 ◽  
Vol 284-287 ◽  
pp. 3375-3379
Author(s):  
Chun Hsiung Tseng ◽  
Fu Cheng Yang ◽  
Yu Ping Tseng ◽  
Yi Yun Chang

Most Web users today rely heavily on search engines to gather information. To achieve better search results, some algorithms such as PageRank have been developed. However, most Web search engines employ keyword-based search and thus have some natural weaknesses. Among these problems, a well-known one is that it is very difficult for search engines to infer semantics from user queries and returned results. Hence, despite of efforts of ranking search results, users may still have to navigate through a huge amount of Web pages to locate the desired resources. In this research, the researchers developed a clustering-based methodology to improve the performance of search engines. Instead of extracting features used for clustering from the returned documents, the proposed method extracts features from the delicious service, which is actually a tag provider service. By utilizing such information, the resulting system can benefit from crowd intelligence. The obtained information is then used for enhancing the performance of the ordinary k-means algorithm to achieve better clustering results.


2019 ◽  
Vol 8 (2S8) ◽  
pp. 1541-1545

Search engine spam is formed by the spam creators for commercial gain. Spammers applied different strategies in web pages to display the first page of web search results. These strategies may avoid displaying good quality web pages in the top of search engine results page. Nowadays there are numerous devised algorithms available to identify search engine spam. Even though search engines are still affected by search engine spam. There is a necessity for search engine industry to filter search engine spam in the best way. The proposed study identifies spam in web search engine. Spammers try to use most popular search keywords, popular links and advertising keywords in web pages. This strategy helps to increase ranking to display the top of search results. The proposed method is used important features to detect spam pages which are classified using decision tree C4.5 classifier. This method produces better performance when compared with existing classification methods.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


2019 ◽  
Vol 71 (1) ◽  
pp. 54-71 ◽  
Author(s):  
Artur Strzelecki

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.


Author(s):  
Novario Jaya Perdana

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.


Author(s):  
Oğuzhan Menemencioğlu ◽  
İlhami Muharrem Orak

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.


Author(s):  
Xiannong Meng ◽  
Song Xing

This chapter reports the results of a project attempting to assess the performance of a few major search engines from various perspectives. The search engines involved in the study include the Microsoft Search Engine (MSE) when it was in its beta test stage, AllTheWeb, and Yahoo. In a few comparisons, other search engines such as Google, Vivisimo are also included. The study collects statistics such as the average user response time, average process time for a query reported by MSE, as well as the number of pages relevant to a query reported by all search engines involved. The project also studies the quality of search results generated by MSE and other search engines using RankPower as the metric. We found MSE performs well in speed and diversity of the query results, while weaker in other statistics, compared to some other leading search engines. The contribution of this chapter is to review the performance evaluation techniques for search engines and use different measures to assess and compare the quality of different search engines, especially MSE.


2019 ◽  
Vol 16 (9) ◽  
pp. 3712-3716
Author(s):  
Kailash Kumar ◽  
Abdulaziz Al-Besher

This paper examines the overlapping of the results retrieved between three major search engines namely Google, Yahoo and Bing. A rigorous analysis of overlap among these search engines was conducted on 100 random queries. The overlap of first ten web page results, i.e., hundred results from each search engine and only non-sponsored results from these above major search engines were taken into consideration. Search engines have their own frequency of updates and ranking of results based on their relevance. Moreover, sponsored search advertisers are different for different search engines. Single search engine cannot index all Web pages. In this research paper, the overlapping analysis of the results were carried out between October 1, 2018 to October 31, 2018 among these major search engines namely, Google, Yahoo and Bing. A framework is built in Java to analyze the overlap among these search engines. This framework eliminates the common results and merges them in a unified list. It also uses the ranking algorithm to re-rank the search engine results and displays it back to the user.


Sign in / Sign up

Export Citation Format

Share Document