Research and Implementation of Improved Real-Time Crawler Modeling

2013 ◽  
Vol 312 ◽  
pp. 791-795
Author(s):  
Xiang Lin Zuo ◽  
Wen Bo Wang ◽  
Ying Wang ◽  
Wan Li Zuo

The past decade has witnessed the rapid development of search engines, which has become an indispensable part of everyday life. However, people are no longer satisfied with accessing to ordinary information, and they may instead pay more attention to fresh information. This demand poses challenges to traditional search engines, which concern more about relevance and importance of web pages. A search engine compresses three modules: crawler, indexer and searcher. Changes are needed for all these three parts to improve search engine's freshness. This paper investigates the first part of search engine crawler, we analyze the requirements for real-time crawler, and propose a novel real-time crawler based on more accurate estimation of refresh time. Experimental results demonstrate that the proposed real-time crawler can help search engine improve its freshness.

2002 ◽  
Vol 63 (4) ◽  
pp. 354-365 ◽  
Author(s):  
Susan Augustine ◽  
Courtney Greene

Have Internet search engines influenced the way students search library Web pages? The results of this usability study reveal that students consistently and frequently use the library Web site’s internal search engine to find information rather than navigating through pages. If students are searching rather than navigating, library Web page designers must make metadata and powerful search engines priorities. The study also shows that students have difficulty interpreting library terminology, experience confusion discerning difference amongst library resources, and prefer to seek human assistance when encountering problems online. These findings imply that library Web sites have not alleviated some of the basic and long-range problems that have challenged librarians in the past.


2014 ◽  
Vol 2 (2) ◽  
pp. 103-112 ◽  
Author(s):  
Taposh Kumar Neogy ◽  
Harish Paruchuri

The essence of a web page is an inherently predisposed issue, one that is built on behaviors, interests, and intelligence. There are relatively a ton of reasons web pages are critical to the new world, as the matter cannot be overemphasized. The meteoric growth of the internet is one of the most potent factors making it hard for search engines to provide actionable results. With classified directories, search engines store web pages. To store these pages, some of the engines rely on the expertise of real people. Most of them are enabled and classified using automated means but the human factor is dominant in their success. From experimental results, we can deduce that the most effective and critical way to automate web pages for search engines is via the integration of machine learning.  


Author(s):  
Oğuzhan Menemencioğlu ◽  
İlhami Muharrem Orak

Semantic web works on producing machine readable data and aims to deal with large amount of data. The most important tool to access the data which exist in web is the search engine. Traditional search engines are insufficient in the face of the amount of data that consists in the existing web pages. Semantic search engines are extensions to traditional engines and overcome the difficulties faced by them. This paper summarizes semantic web, concept of traditional and semantic search engines and infrastructure. Also semantic search approaches are detailed. A summary of the literature is provided by touching on the trends. In this respect, type of applications and the areas worked for are considered. Based on the data for two different years, trend on these points are analyzed and impacts of changes are discussed. It shows that evaluation on the semantic web continues and new applications and areas are also emerging. Multimedia retrieval is a newly scope of semantic. Hence, multimedia retrieval approaches are discussed. Text and multimedia retrieval is analyzed within semantic search.


2019 ◽  
Vol 16 (9) ◽  
pp. 3712-3716
Author(s):  
Kailash Kumar ◽  
Abdulaziz Al-Besher

This paper examines the overlapping of the results retrieved between three major search engines namely Google, Yahoo and Bing. A rigorous analysis of overlap among these search engines was conducted on 100 random queries. The overlap of first ten web page results, i.e., hundred results from each search engine and only non-sponsored results from these above major search engines were taken into consideration. Search engines have their own frequency of updates and ranking of results based on their relevance. Moreover, sponsored search advertisers are different for different search engines. Single search engine cannot index all Web pages. In this research paper, the overlapping analysis of the results were carried out between October 1, 2018 to October 31, 2018 among these major search engines namely, Google, Yahoo and Bing. A framework is built in Java to analyze the overlap among these search engines. This framework eliminates the common results and merges them in a unified list. It also uses the ranking algorithm to re-rank the search engine results and displays it back to the user.


Author(s):  
Jie Zhao ◽  
Jianfei Wang ◽  
Jia Yang ◽  
Peiquan Jin

Company acquisition relation reflects a company's development intent and competitive strategies, which is an important type of enterprise competitive intelligence. In the traditional environment, the acquisition of competitive intelligence mainly relies on newspapers, internal reports, and so on, but the rapid development of the Web introduces a new way to extract company acquisition relation. In this paper, the authors study the problem of extracting company acquisition relation from huge amounts of Web pages, and propose a novel algorithm for company acquisition relation extraction. The authors' algorithm considers the tense feature of Web content and classification technology of semantic strength when extracting company acquisition relation from Web pages. It first determines the tense of each sentence in a Web page, which is then applied in sentences classification so as to evaluate the semantic strength of the candidate sentences in describing company acquisition relation. After that, the authors rank the candidate acquisition relations and return the top-k company acquisition relation. They run experiments on 6144 pages crawled through Google, and measure the performance of their algorithm under different metrics. The experimental results show that the algorithm is effective in determining the tense of sentences as well as the company acquisition relation.


2014 ◽  
Vol 16 (1) ◽  
Author(s):  
Eugene B. Visser ◽  
Melius Weideman

Background: Most websites, especially those with a commercial orientation, need a high ranking on a search engine for one or more keywords or phrases. The search engine optimisation process attempts to achieve this. Furthermore, website users expect easy navigation, interaction and transactional ability. The application of website usability principles attempts to achieve this. Ideally, designers should achieve both goals when they design websites.Objectives: This research intended to establish a relationship between search engine optimisation and website usability in order to guide the industry. The authors found a discrepancy between the perceived roles of search engines and website usability.Method: The authors designed three test websites. Each had different combinations of usability, visibility and other attributes. They recorded and analysed the conversions and financial spending on these experimental websites. Finally, they designed a model that fuses search engine optimisation and website usability.Results: Initially, it seemed that website usability and search engine optimisation complemented each other. However, some contradictions between the two, based on content, keywords and their presentation, emerged. Industry experts do not acknowledge these contradictions, although they agree on the existence of the individual elements. The new model highlights the complementary and contradictory aspects.Conclusion: The authors found no evidence of any previous empirical experimental results that could confirm or refute the role of the model. In the fast-paced world of competition between commercial websites, this adds value and originality to the websites of organisations whose websites play important roles.


2010 ◽  
Vol 09 (06) ◽  
pp. 873-888 ◽  
Author(s):  
TZUNG-PEI HONG ◽  
CHING-YAO WANG ◽  
CHUN-WEI LIN

Mining knowledge from large databases has become a critical task for organizations. Managers commonly use the obtained sequential patterns to make decisions. In the past, databases were usually assumed to be static. In real-world applications, however, transactions may be updated. In this paper, a maintenance algorithm for rapidly updating sequential patterns for real-time decision making is proposed. The proposed algorithm utilizes previously discovered large sequences in the maintenance process, thus greatly reducing the number of database rescans and improving performance. Experimental results verify the performance of the proposed approach. The proposed algorithm provides real-time knowledge that can be used for decision making.


2011 ◽  
Vol 460-461 ◽  
pp. 747-753
Author(s):  
Ying Shi Kang ◽  
Hai Ning Wang

With the rapid development of internet technology, focusing on the product design of individual users, emphasizing the interaction design for Web and improving the user experience have become an inevitable trend of Web design, and also the hot spot of the design of personalized search engine. This paper proposed an optimized algorithm for building user models for product design websites. In order to show the design dimensions of Web pages presented by a browser, a concept of freshness is presented in this algorithm. By analyzing the user behavior of browsing Web pages, the model was updated using methods of machine learning. At last, the performance and effectiveness of this algorithm was analyzed and estimated through the simulation experiment.


Author(s):  
FRANCESCO G. B. DE NATALE ◽  
FABRIZIO GRANELLI ◽  
GIANNI VERNAZZA

Texture analysis based on the extraction of contrast features is very effective in terms of both computational complexity and discrimination capability. In this framework, max–min approaches have been proposed in the past as a simple and powerful tool to characterize a statistical texture. In the present work, a method is proposed that allows exploiting the potential of max–min approaches to efficiently solve the problem of detecting local alterations in a uniform statistical texture. Experimental results show a high defect discrimination capability, and a good attitude to real-time applications, which make it particularly attractive for the development of industrial visual inspection systems.


2021 ◽  
Author(s):  
Srihari Vemuru ◽  
Eric John ◽  
Shrisha Rao

Humans can easily parse and find answers to complex queries such as "What was the capital of the country of the discoverer of the element which has atomic number 1?" by breaking them up into small pieces, querying these appropriately, and assembling a final answer. However, contemporary search engines lack such capability and fail to handle even slightly complex queries. Search engines process queries by identifying keywords and searching against them in knowledge bases or indexed web pages. The results are, therefore, dependent on the keywords and how well the search engine handles them. In our work, we propose a three-step approach called parsing, tree generation, and querying (PTGQ) for effective searching of larger and more expressive queries of potentially unbounded complexity. PTGQ parses a complex query and constructs a query tree where each node represents a simple query. It then processes the complex query by recursively querying a back-end search engine, going over the corresponding query tree in postorder. Using PTGQ makes sure that the search engine always handles a simpler query containing very few keywords. Results demonstrate that PTGQ can handle queries of much higher complexity than standalone search engines.


Sign in / Sign up

Export Citation Format

Share Document