The Canon of Dutch Literature According to Google

Distributed and collaborative Web Change Detection system

Computer Science and Information Systems ◽

10.2298/csis131120081p ◽

2015 ◽

Vol 12 (1) ◽

pp. 91-114 ◽

Cited By ~ 7

Author(s):

Víctor Prieto ◽

Manuel Álvarez ◽

Víctor Carneiro ◽

Fidel Cacheda

Keyword(s):

Change Detection ◽

Search Engines ◽

Web Site ◽

Detection System ◽

Computational Cost ◽

Web Pages ◽

Web Page ◽

Case Scenario ◽

Worst Case ◽

The Web

Search engines use crawlers to traverse the Web in order to download web pages and build their indexes. Maintaining these indexes up-to-date is an essential task to ensure the quality of search results. However, changes in web pages are unpredictable. Identifying the moment when a web page changes as soon as possible and with minimal computational cost is a major challenge. In this article we present the Web Change Detection system that, in a best case scenario, is capable to detect, almost in real time, when a web page changes. In a worst case scenario, it will require, on average, 12 minutes to detect a change on a low PageRank web site and about one minute on a web site with high PageRank. Meanwhile, current search engines require more than a day, on average, to detect a modification in a web page (in both cases).

Download Full-text

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0531 ◽

2010 ◽

Vol 14 (5) ◽

pp. 531-539

Author(s):

Mu-Chun Su ◽

◽

Shao-Jui Wang ◽

Chen-Ko Huang ◽

Pa-ChunWang ◽

...

Keyword(s):

Web Services ◽

World Wide ◽

Information Sources ◽

State Of The Art ◽

Value Added ◽

Web Pages ◽

Web Page ◽

Web Information ◽

The World ◽

The Web

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a signal-representation-based parser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.

Download Full-text

BUILDING A KNOWLEDGE BASE FOR IMPLEMENTING A WEB-BASED COMPUTERIZED RECOMMENDATION SYSTEM

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213007003552 ◽

2007 ◽

Vol 16 (05) ◽

pp. 793-828 ◽

Cited By ~ 10

Author(s):

JUAN D. VELÁSQUEZ ◽

VASILE PALADE

Keyword(s):

Knowledge Base ◽

Web Site ◽

Web Mining ◽

Recommendation System ◽

The Internet ◽

Web Pages ◽

Web Based ◽

Web Logs ◽

Mining Tools ◽

The Web

Understanding the web user browsing behaviour in order to adapt a web site to the needs of a particular user represents a key issue for many commercial companies that do their business over the Internet. This paper presents the implementation of a Knowledge Base (KB) for building web-based computerized recommender systems. The Knowledge Base consists of a Pattern Repository that contains patterns extracted from web logs and web pages, by applying various web mining tools, and a Rule Repository containing rules that describe the use of discovered patterns for building navigation or web site modification recommendations. The paper also focuses on testing the effectiveness of the proposed online and offline recommendations. An ample real-world experiment is carried out on a web site of a bank.

Download Full-text

Analysis of Web Pages Based the Changed Information and its’ Application in the Search Engine for one Web Site

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.2311 ◽

2013 ◽

Vol 303-306 ◽

pp. 2311-2316

Author(s):

Hong Shen Liu ◽

Peng Fei Wang

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Site ◽

New Method ◽

Web Pages ◽

Web Crawler ◽

The Core ◽

Core Technology ◽

The Web

The structures and contents of researching search engines are presented and the core technology is the analysis technology of web pages. The characteristic of analyzing web pages in one website is studied, relations between the web pages web crawler gained at two times are able to be obtained and the changed information among them are found easily. A new method of analyzing web pages in one website is introduced and the method analyzes web pages with the changed information of web pages. The result of applying the method shows that the new method is effective in the analysis of web pages.

Download Full-text

Inducing Schema.org markup from Natural Language Context

10.29007/fvc9 ◽

2019 ◽

Author(s):

Gautam Kishore Shahi ◽

Durgesh Nandini ◽

Sushma Kumari

Keyword(s):

Natural Language ◽

Knowledge Base ◽

Structured Data ◽

Knowledge Graph ◽

Web Pages ◽

Web Data ◽

Experimental Part ◽

Language Context ◽

Data Commons ◽

The Web

Schema.org creates, supports and maintain schemas for structured data on the web pages. For a non-technical author, it is difficult to publish contents in a structured format. This work presents an automated way of inducing Schema.org markup from natural language context of web-pages by applying knowledge base creation technique. As a dataset, Web Data Commons was used, and the scope for the experimental part was limited to RDFa. The approach was implemented using the Knowledge Graph building techniques - Knowledge Vault and KnowMore.

Download Full-text

Web Crawler and Web Crawler Algorithms: A Perspective

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9362.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 203-205

Keyword(s):

Search Engine ◽

Search Engines ◽

The Internet ◽

Web Pages ◽

Web Crawler ◽

Day By Day ◽

The Web

A web crawler is also called spider. For the intention of web indexing it automatically searches on the WWW. As the W3 is increasing day by day, globally the number of web pages grown massively. To make the search sociable for users, searching engine are mandatory. So to discover the particular data from the WWW search engines are operated. It would be almost challenging for mankind devoid of search engines to find anything from the web unless and until he identifies a particular URL address. A central depository of HTML documents in indexed form is sustained by every search Engine. Every time an operator gives the inquiry, searching is done at the database of indexed web pages. The size of a database of every search engine depends on the existing page on the internet. So to increase the proficiency of search engines, it is permitted to store only the most relevant and significant pages in the database.

Download Full-text

Classification of Web Pages Using Machine Learning Techniques

Social Implications of Data Mining and Information Privacy ◽

10.4018/978-1-60566-196-4.ch008 ◽

2010 ◽

pp. 134-150

Author(s):

K. Selvakuberan ◽

M. Indra Devi ◽

R. Rajaram

Keyword(s):

Machine Learning ◽

Search Engines ◽

Research Problem ◽

Machine Learning Techniques ◽

The Internet ◽

Web Pages ◽

Current Research Problem ◽

Learning Techniques ◽

The Web

The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.

Download Full-text

Classification of Web Pages Using Machine Learning Techniques

Machine Learning ◽

10.4018/978-1-60960-818-7.ch105 ◽

2012 ◽

pp. 50-65 ◽

Cited By ~ 1

Author(s):

K. Selvakuberan ◽

M. Indra Devi ◽

R. Rajaram

Keyword(s):

Machine Learning ◽

Search Engines ◽

Research Problem ◽

Machine Learning Techniques ◽

The Internet ◽

Web Pages ◽

Current Research Problem ◽

Learning Techniques ◽

The Web

The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.

Download Full-text

How to Promote Community Portals

Encyclopedia of Portal Technologies and Applications ◽

10.4018/978-1-59140-989-2.ch077 ◽

2011 ◽

pp. 454-460

Author(s):

Aki Vainio ◽

Kimmo Salmenjoki

Keyword(s):

Information Content ◽

Search Engines ◽

Web Pages ◽

Web Environment ◽

The Common ◽

Marketing Tool ◽

The Masses ◽

The Web

Information content of the Web has, in the last 10 years, changed from informative to communicative. Web pages, especially homepages, were the foremost places where companies, organizations, and individuals alike expressed their existence online and provided some information about themselves, like their products, services, or artefacts that they related to. On the common Web environment, the search engines were harvesting this information and made it available and meaningful for the masses of Web users. In the early days of Web, this factor alone justified the usage of Web as a marketing tool and as an easy way to share important information between collaborating partners.

Download Full-text

Optimizing performance of search engines based on user behavior

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.7.10715 ◽

2018 ◽

Vol 7 (2.7) ◽

pp. 359

Author(s):

Dr Jkr Sastry ◽

M Sri Harsha Vamsi ◽

R Srinivas ◽

G Yeshwanth

Keyword(s):

Search Engines ◽

User Behavior ◽

Search Process ◽

Web Pages ◽

User Characteristics ◽

End User ◽

The Web

WEB clients use the WEB for searching the content that they are looking for through inputting keywords or snippets as input to the search engines. Search Engines follows a process to collect the content and provide the same as output in terms of URL links. One can observe that only 20% of the outputted URLS are of use to the end user. 80% of output is unnecessarily surfed leading to wastage of money and time. Customers have surfing characteristics which can be collected as the user keep surfing. The search process can be made efficient by including the user characteristics / Behaviors as part and parcel of search process. This paper is aimed at improving the search process through integration of the user behavior in indexing and ranking the web pages.

Download Full-text