Research and Implementation of Improved Real-Time Crawler Modeling

The past decade has witnessed the rapid development of search engines, which has become an indispensable part of everyday life. However, people are no longer satisfied with accessing to ordinary information, and they may instead pay more attention to fresh information. This demand poses challenges to traditional search engines, which concern more about relevance and importance of web pages. A search engine compresses three modules: crawler, indexer and searcher. Changes are needed for all these three parts to improve search engine's freshness. This paper investigates the first part of search engine crawler, we analyze the requirements for real-time crawler, and propose a novel real-time crawler based on more accurate estimation of refresh time. Experimental results demonstrate that the proposed real-time crawler can help search engine improve its freshness.

Download Full-text

Discovering How Students Search a Library Web Site: A Usability Case Study

College & Research Libraries ◽

10.5860/crl.63.4.354 ◽

2002 ◽

Vol 63 (4) ◽

pp. 354-365 ◽

Cited By ~ 43

Author(s):

Susan Augustine ◽

Courtney Greene

Keyword(s):

Search Engine ◽

Long Range ◽

Web Sites ◽

Search Engines ◽

Web Pages ◽

Usability Study ◽

Web Page ◽

The Past ◽

Library Resources

Have Internet search engines influenced the way students search library Web pages? The results of this usability study reveal that students consistently and frequently use the library Web site’s internal search engine to find information rather than navigating through pages. If students are searching rather than navigating, library Web page designers must make metadata and powerful search engines priorities. The study also shows that students have difficulty interpreting library terminology, experience confusion discerning difference amongst library resources, and prefer to seek human assistance when encountering problems online. These findings imply that library Web sites have not alleviated some of the basic and long-range problems that have challenged librarians in the past.

Download Full-text

Machine Learning as a New Search Engine Interface: An Overview

Engineering International ◽

10.18034/ei.v2i2.539 ◽

2014 ◽

Vol 2 (2) ◽

pp. 103-112 ◽

Cited By ~ 1

Author(s):

Taposh Kumar Neogy ◽

Harish Paruchuri

Keyword(s):

Machine Learning ◽

Search Engine ◽

Search Engines ◽

New World ◽

Human Factor ◽

Experimental Results ◽

The Internet ◽

Web Pages ◽

Web Page ◽

Real People

The essence of a web page is an inherently predisposed issue, one that is built on behaviors, interests, and intelligence. There are relatively a ton of reasons web pages are critical to the new world, as the matter cannot be overemphasized. The meteoric growth of the internet is one of the most potent factors making it hard for search engines to provide actionable results. With classified directories, search engines store web pages. To store these pages, some of the engines rely on the expertise of real people. Most of them are enabled and classified using automated means but the human factor is dominant in their success. From experimental results, we can deduce that the most effective and critical way to automate web pages for search engines is via the integration of machine learning.

Download Full-text

A Review on Semantic Text and Multimedia Retrieval and Recent Trends

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2015010104 ◽

2015 ◽

Vol 6 (1) ◽

pp. 54-74

Author(s):

Oğuzhan Menemencioğlu ◽

İlhami Muharrem Orak

Keyword(s):

Semantic Web ◽

Search Engine ◽

Search Engines ◽

Semantic Search ◽

Multimedia Retrieval ◽

Web Pages ◽

The Face ◽

Recent Trends ◽

New Applications ◽

Machine Readable

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.

Download Full-text

Critical Analysis of Major Search Engines

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8239 ◽

2019 ◽

Vol 16 (9) ◽

pp. 3712-3716

Author(s):

Kailash Kumar ◽

Abdulaziz Al-Besher

Keyword(s):

Search Engine ◽

Search Engines ◽

Critical Analysis ◽

Research Paper ◽

Web Pages ◽

Ranking Algorithm ◽

Extracting Top-k Company Acquisition Relations From the Web

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2017100102 ◽

2017 ◽

Vol 13 (4) ◽

pp. 27-41 ◽

Cited By ~ 1

Author(s):

Jie Zhao ◽

Jianfei Wang ◽

Jia Yang ◽

Peiquan Jin

Keyword(s):

Rapid Development ◽

Relation Extraction ◽

Experimental Results ◽

Competitive Intelligence ◽

Web Pages ◽

Web Content ◽

Web Page ◽

Competitive Strategies ◽

The Web ◽

Novel Algorithm

Company acquisition relation reflects a company's development intent and competitive strategies, which is an important type of enterprise competitive intelligence. In the traditional environment, the acquisition of competitive intelligence mainly relies on newspapers, internal reports, and so on, but the rapid development of the Web introduces a new way to extract company acquisition relation. In this paper, the authors study the problem of extracting company acquisition relation from huge amounts of Web pages, and propose a novel algorithm for company acquisition relation extraction. The authors' algorithm considers the tense feature of Web content and classification technology of semantic strength when extracting company acquisition relation from Web pages. It first determines the tense of each sentence in a Web page, which is then applied in sentences classification so as to evaluate the semantic strength of the candidate sentences in describing company acquisition relation. After that, the authors rank the candidate acquisition relations and return the top-k company acquisition relation. They run experiments on 6144 pages crawled through Google, and measure the performance of their algorithm under different metrics. The experimental results show that the algorithm is effective in determining the tense of sentences as well as the company acquisition relation.

Download Full-text

Fusing website usability and search engine optimisation

SA Journal of Information Management ◽

10.4102/sajim.v16i1.577 ◽

2014 ◽

Vol 16 (1) ◽

Cited By ~ 1

Author(s):

Eugene B. Visser ◽

Melius Weideman

Keyword(s):

Search Engine ◽

Search Engines ◽

Experimental Results ◽

New Model ◽

High Ranking ◽

Website Usability ◽

The Individual

Background: Most websites, especially those with a commercial orientation, need a high ranking on a search engine for one or more keywords or phrases. The search engine optimisation process attempts to achieve this. Furthermore, website users expect easy navigation, interaction and transactional ability. The application of website usability principles attempts to achieve this. Ideally, designers should achieve both goals when they design websites.Objectives: This research intended to establish a relationship between search engine optimisation and website usability in order to guide the industry. The authors found a discrepancy between the perceived roles of search engines and website usability.Method: The authors designed three test websites. Each had different combinations of usability, visibility and other attributes. They recorded and analysed the conversions and financial spending on these experimental websites. Finally, they designed a model that fuses search engine optimisation and website usability.Results: Initially, it seemed that website usability and search engine optimisation complemented each other. However, some contradictions between the two, based on content, keywords and their presentation, emerged. Industry experts do not acknowledge these contradictions, although they agree on the existence of the individual elements. The new model highlights the complementary and contradictory aspects.Conclusion: The authors found no evidence of any previous empirical experimental results that could confirm or refute the role of the model. In the fast-paced world of competition between commercial websites, this adds value and originality to the websites of organisations whose websites play important roles.

Download Full-text

PROVIDING TIMELY UPDATED SEQUENTIAL PATTERNS IN DECISION MAKING

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010004147 ◽

2010 ◽

Vol 09 (06) ◽

pp. 873-888 ◽

Cited By ~ 4

Author(s):

TZUNG-PEI HONG ◽

CHING-YAO WANG ◽

CHUN-WEI LIN

Keyword(s):

Decision Making ◽

Real Time ◽

Real World ◽

Experimental Results ◽

Sequential Patterns ◽

The Past ◽

Large Databases ◽

Real World Applications ◽

Critical Task ◽

Maintenance Process

Mining knowledge from large databases has become a critical task for organizations. Managers commonly use the obtained sequential patterns to make decisions. In the past, databases were usually assumed to be static. In real-world applications, however, transactions may be updated. In this paper, a maintenance algorithm for rapidly updating sequential patterns for real-time decision making is proposed. The proposed algorithm utilizes previously discovered large sequences in the maintenance process, thus greatly reducing the number of database rescans and improving performance. Experimental results verify the performance of the proposed approach. The proposed algorithm provides real-time knowledge that can be used for decision making.

Download Full-text

User Model of Personalized Search Engine for Product Design Based on Machine Learning

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.460-461.747 ◽

2011 ◽

Vol 460-461 ◽

pp. 747-753

Author(s):

Ying Shi Kang ◽

Hai Ning Wang

Keyword(s):

Machine Learning ◽

Product Design ◽

Search Engine ◽

Interaction Design ◽

Hot Spot ◽

Rapid Development ◽

Web Design ◽

Internet Technology ◽

Web Pages ◽

Personalized Search

With the rapid development of internet technology, focusing on the product design of individual users, emphasizing the interaction design for Web and improving the user experience have become an inevitable trend of Web design, and also the hot spot of the design of personalized search engine. This paper proposed an optimized algorithm for building user models for product design websites. In order to show the design dimensions of Web pages presented by a browser, a concept of freshness is presented in this algorithm. By analyzing the user behavior of browsing Web pages, the model was updated using methods of machine learning. At last, the performance and effectiveness of this algorithm was analyzed and estimated through the simulation experiment.

Download Full-text

A CONTRAST-BASED APPROACH TO THE IDENTIFICATION OF TEXTURE FAULTS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001402001617 ◽

2002 ◽

Vol 16 (02) ◽

pp. 193-214

Author(s):

FRANCESCO G. B. DE NATALE ◽

FABRIZIO GRANELLI ◽

GIANNI VERNAZZA

Keyword(s):

Computational Complexity ◽

Real Time ◽

Texture Analysis ◽

Visual Inspection ◽

Experimental Results ◽

Discrimination Capability ◽

The Past ◽

Inspection Systems ◽

Real Time Applications ◽

Good Attitude

Texture analysis based on the extraction of contrast features is very effective in terms of both computational complexity and discrimination capability. In this framework, max–min approaches have been proposed in the past as a simple and powerful tool to characterize a statistical texture. In the present work, a method is proposed that allows exploiting the potential of max–min approaches to efficiently solve the problem of detecting local alterations in a uniform statistical texture. Experimental results show a high defect discrimination capability, and a good attitude to real-time applications, which make it particularly attractive for the development of industrial visual inspection systems.

Download Full-text

Handling Complex Queries Using Query Trees

10.36227/techrxiv.14845212 ◽

2021 ◽

Author(s):

Srihari Vemuru ◽

Eric John ◽

Shrisha Rao

Keyword(s):

Search Engine ◽

Search Engines ◽

Knowledge Bases ◽

Web Pages ◽

Complex Query ◽

Complex Queries ◽

Tree Generation ◽

Query Tree ◽

Final Answer ◽

Simple Query

Humans can easily parse and find answers to complex queries such as "What was the capital of the country of the discoverer of the element which has atomic number 1?" by breaking them up into small pieces, querying these appropriately, and assembling a final answer. However, contemporary search engines lack such capability and fail to handle even slightly complex queries. Search engines process queries by identifying keywords and searching against them in knowledge bases or indexed web pages. The results are, therefore, dependent on the keywords and how well the search engine handles them. In our work, we propose a three-step approach called parsing, tree generation, and querying (PTGQ) for effective searching of larger and more expressive queries of potentially unbounded complexity. PTGQ parses a complex query and constructs a query tree where each node represents a simple query. It then processes the complex query by recursively querying a back-end search engine, going over the corresponding query tree in postorder. Using PTGQ makes sure that the search engine always handles a simpler query containing very few keywords. Results demonstrate that PTGQ can handle queries of much higher complexity than standalone search engines.

Download Full-text