scholarly journals MACHINE LEARNING BASED SEARCH ENGINE WITH CRAWLING, INDEXING AND RANKING

Author(s):  
Anuradha T ◽  
Tayyaba Nousheen

The web is the heap and huge collection of wellspring of data. The Search Engine are used for retrieving the information from World Wide Web (WWW). Search Engines are helpful for searching user keywords and provide the accurate result in fraction of seconds. This paper proposed Machine Learning based search engine which will give more relevant user searches in the form of web pages. To display the user entered query search engine plays a major role of basic interface. Every site comprises of the heaps of site pages that are being made and sent on the server.

2005 ◽  
Vol 277-279 ◽  
pp. 361-368
Author(s):  
Soo Sun Cho ◽  
Dong Won Han ◽  
Chi Jung Hwang

Redundant images currently abundant in World Wide Web pages need to be removed in order to transform or simplify the Web pages for suitable display in small-screened devices. Classifying removable images on the Web pages according to their uniqueness of content will allow simpler representation of Web pages. For such classification, machine learning based methods can be used to categorize images into two groups; eliminable and non-eliminable. We use two representative learning methods, the Naïve Bayesian classifier and C4.5 decision trees. For our Web image classification, we propose new features that have expressive power for Web images to be classified. We apply image samples to the two classifiers and analyze the results. In addition, we propose an algorithm to construct an optimized subset from a whole feature set, which includes most influential features for the purposes of classification. By using the optimized feature set, the accuracy of classification is found to improve markedly.


2018 ◽  
pp. 742-748
Author(s):  
Viveka Vardhan Jumpala

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.


Author(s):  
Rimpal Unadkat

The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The tremendous growth in the volume of data and with the terrific growth of number of web pages, traditional search engines now days are not appropriate and not suitable anymore. Search engine is the most important tool to discover any information in World Wide Web. Semantic Search Engine is born of traditional search engine to overcome the above problem. However, to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper the authors present survey on the role of search engines in intelligent web, Background, Challenges and some issues.


Author(s):  
Rimpal Unadkat

The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The tremendous growth in the volume of data and with the terrific growth of number of web pages, traditional search engines now days are not appropriate and not suitable anymore. Search engine is the most important tool to discover any information in World Wide Web. Semantic Search Engine is born of traditional search engine to overcome the above problem. However, to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper the authors present survey on the role of search engines in intelligent web, Background, Challenges and some issues.


Author(s):  
Viveka Vardhan Jumpala

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.


Author(s):  
Vijay Kasi ◽  
Radhika Jain

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.


Web Services ◽  
2019 ◽  
pp. 2138-2143
Author(s):  
Rimpal Unadkat

The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The tremendous growth in the volume of data and with the terrific growth of number of web pages, traditional search engines now days are not appropriate and not suitable anymore. Search engine is the most important tool to discover any information in World Wide Web. Semantic Search Engine is born of traditional search engine to overcome the above problem. However, to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper the authors present survey on the role of search engines in intelligent web, Background, Challenges and some issues.


Author(s):  
Punam Bedi ◽  
Neha Gupta ◽  
Vinita Jindal

The World Wide Web is a part of the Internet that provides data dissemination facility to people. The contents of the Web are crawled and indexed by search engines so that they can be retrieved, ranked, and displayed as a result of users' search queries. These contents that can be easily retrieved using Web browsers and search engines comprise the Surface Web. All information that cannot be crawled by search engines' crawlers falls under Deep Web. Deep Web content never appears in the results displayed by search engines. Though this part of the Web remains hidden, it can be reached using targeted search over normal Web browsers. Unlike Deep Web, there exists a portion of the World Wide Web that cannot be accessed without special software. This is known as the Dark Web. This chapter describes how the Dark Web differs from the Deep Web and elaborates on the commonly used software to enter the Dark Web. It highlights the illegitimate and legitimate sides of the Dark Web and specifies the role played by cryptocurrencies in the expansion of Dark Web's user base.


Author(s):  
Adélia Gouveia ◽  
Jorge Cardoso

The World Wide Web (WWW) emerged in 1989, developed by Tim Berners-Lee who proposed to build a system for sharing information among physicists of the CERN (Conseil Européen pour la Recherche Nucléaire), the world’s largest particle physics laboratory. Currently, the WWW is primarily composed of documents written in HTML (hyper text markup language), a language that is useful for visual presentation (Cardoso & Sheth, 2005). HTML is a set of “markup” symbols contained in a Web page intended for display on a Web browser. Most of the information on the Web is designed only for human consumption. Humans can read Web pages and understand them, but their inherent meaning is not shown in a way that allows their interpretation by computers (Cardoso & Sheth, 2006). Since the visual Web does not allow computers to understand the meaning of Web pages (Cardoso, 2007), the W3C (World Wide Web Consortium) started to work on a concept of the Semantic Web with the objective of developing approaches and solutions for data integration and interoperability purpose. The goal was to develop ways to allow computers to understand Web information. The aim of this chapter is to present the Web ontology language (OWL) which can be used to develop Semantic Web applications that understand information and data on the Web. This language was proposed by the W3C and was designed for publishing, sharing data and automating data understood by computers using ontologies. To fully comprehend OWL we need first to study its origin and the basic blocks of the language. Therefore, we will start by briefly introducing XML (extensible markup language), RDF (resource description framework), and RDF Schema (RDFS). These concepts are important since OWL is written in XML and is an extension of RDF and RDFS.


Author(s):  
Kevin Curran ◽  
Gary Gumbleton

Tim Berners-Lee, director of the World Wide Web Consortium (W3C), states that, “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation” (Berners-Lee, 2001). The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents, roaming from page to page, can readily carry out sophisticated tasks for users. The Semantic Web (SW) is a vision of the Web where information is more efficiently linked up in such a way that machines can more easily process it. It is generating interest not just because Tim Berners-Lee is advocating it, but because it aims to solve the problem of information being hidden away in HTML documents, which are easy for humans to get information out of but are difficult for machines to do so. We will discuss the Semantic Web here.


Sign in / Sign up

Export Citation Format

Share Document